-
-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Azure Storage SDK to v12, add logging for Azure Blob storage directory #198
Conversation
…ob-v1 # Conflicts: # build/build.xml
…ob-v1 # Conflicts: # src/Examine.AzureDirectory/AzureDirectory.cs # src/Examine.AzureDirectory/Properties/AssemblyInfo.cs # src/Examine/Properties/AssemblyInfo.cs
…ons. Refactor to simplify code/ improve code reuse. Change framework to 4.6.1 for azuredirectory, test and demo projects as required by blob storage package
…h namespace. Integration test is not very reliable with the azure storage emulator.
@nzdev Great Job about update of blob provider, but in same Time I dont really agree we should include logger, as Examine is low API abstraction, and logging should be on layer which use that, not inside of low level abstraction :) |
Thanks for the feedback @bielu. I hope we can work together to get azuredirectory working well 😁. I do think adding logging is appropriate. I've made it so it's opt in to provide the logger instance. However I believe it's important to have logging within this project as the exceptions are handled within this layer. Without logging at this layer it's going to be difficult to know what happens when things go wrong. |
@Shazwazza I've done a bit of refactoring in this branch to clean things up, but any logic issues/ comments required would be better in the other branch. I'm not sure if there would need to be two releases of azure directory as .net4.5 is not supported in the current azure sdk. |
azure directory hasn't been released for v1 so we can do whatever needs to be done :) as for back porting it to 0.x that may be a diff story but we'll keep these changes for v1. |
In that case @Shazwazza I would recommend going with this branch as I've cleaned up and fixed a few things |
@nzdev I am actually also doing cleanup and refactors and now we can finish with total different outcomes :), Can you send same PR to me and then I will merge codes and we will finish with PR instead of 2, which will make job for @Shazwazza much easier? |
If you are on .net 4.6. @bielu It may be worth branching off this branch instead as it's already cleaned up and fixes some of the things pointed out by @Shazwazza. Otherwise, happy to send a pr your way too. |
Will look over weekend into that :) |
@nzdev I just went through your fork of mine fork, and I am going to comment there a lot off. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is few comments which I had time to prepare, I still think idea of manifest is wrong or least wrongly implement. In loadbalanced environment you dont want be checking on flag, as it will cause delays, my approach with checking on that on request, causing that to be almost realtime.
I am going to build beta package from that branch and check if it is working and not breaking on loadbalanced environment and compare how much difference there is on time of reaction.
} | ||
public Lucene.Net.Store.Directory CacheDirectory { get; protected set; } | ||
|
||
public abstract string[] CheckDirtyWithoutWriter(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
|
||
public abstract string[] CheckDirtyWithoutWriter(); | ||
|
||
public abstract void SetDirty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
public class ExamineIndexWriter : IndexWriter | ||
{ | ||
private ExamineDirectory _examineDirectory; | ||
public ExamineIndexWriter(Directory d, Analyzer a, MaxFieldLength mfl) : base(d, a, mfl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so that creation make sense.
@@ -78,13 +93,26 @@ public class LuceneIndex : BaseIndexProvider, IDisposable, IIndexStats | |||
DefaultAnalyzer = analyzer ?? new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30); | |||
|
|||
_directory = luceneDirectory; | |||
if (luceneDirectory is ExamineDirectory dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
@@ -422,7 +469,14 @@ public void EnsureIndex(bool forceOverwrite) | |||
|
|||
//remove all of the index data | |||
_writer.DeleteAll(); | |||
_writer.Commit(); | |||
if(_writer is ExamineIndexWriter examineIndexWriter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still dont think so idea of Custom writer and manifest make sense at all.
} | ||
|
||
public override void DeleteFromIndex(IEnumerable<string> itemIds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should canceled operation in OnDocumentWriting instead.
@@ -71,13 +71,13 @@ protected ValueSetValidationResult ValidateItem(ValueSet item) | |||
/// Validates the items and calls <see cref="M:Examine.Providers.BaseIndexProvider.PerformIndexItems(System.Collections.Generic.IEnumerable{Examine.ValueSet})" /> | |||
/// </summary> | |||
/// <param name="values"></param> | |||
public void IndexItems(IEnumerable<ValueSet> values) | |||
public virtual void IndexItems(IEnumerable<ValueSet> values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we will do as comments above we dont need override that methods, but I agree they should be virtual anyway :)
Hi all, sorry haven't had time to follow up on this just yet (still getting back from holiday mode!) But I had a thought and I'm unsure what the solution is. The I'm unsure if the v11 and v12 packages can co-exist, anyone know? Else there's an ugly ILMerge route that could be taken, else I think that media blob package will need to be updated too and this could only exist alongside that newer version. Thoughts? |
@Shazwazza v11 and v12 from that what I know can coexist, but it is not recommend and we should probably help update package to not use deprecated apis anyway. (It is based on that they are in different namespace(with and without Microsoft) |
@Shazwazza after few messages with @nzdev we figure out the best way of handling all providers for remote directory is to abstract out APIs for Azure/S3 etc can you maybe say which approach you prefer to do move abstractions?
|
Hi all,
At this stage I think you both know more about the requirements then I do since my head hasn't been in this part of the codebase as much as you have recently. Having a separate 3rd package seems like a 'cleaner' solution and also means that if changes are needed within that package it can be deployed independently of an Examine version. But happy to take your lead, what do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have left lots of comments/questions inline. I'm unsure about the ExamineIndexWriter
change. It seems that all logic that this is dealing with really has nothing to do with the default LuceneEngine. Things like dirty checking, etc... are all to do with specific implementations of LuceneIndex. Seems like it would make more sense to have an inherited class deal with that stuff that is specific to 'Remote' directories.
catch(Azure.RequestFailedException ex) when (ex.Status == 409) | ||
{ | ||
//File already exists | ||
throw; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to catch this and then just re-throw? Or is this just for debugging if breakpointing?
|
||
public Lucene.Net.Store.Directory CacheDirectory { get; protected set; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be protected set? This is a ctor dependency so should be readonly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Shazwazza it can't be readonly as it is used an swapped by reaonly, it is why it is protected. if you check there logic:
CacheDirectory = new SimpleFSDirectory(directory); |
To avoid corruption of indexes :)
Trace.WriteLine($"INFO Syncing file {fileName} for {RootFolder}"); | ||
// then we will get it fresh into local deflatedName | ||
// StreamOutput deflatedStream = new StreamOutput(CacheDirectory.CreateOutput(deflatedName)); | ||
using (var deflatedStream = new MemoryStream()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deflatedStream
here isn't the correct name, it's just stream that the blob is downloaded too. I realize this was a copy/paste from other code but should be renamed, even just memStream
or whatever is fine.
blob.DeleteIfExists(); | ||
SetDirty(); | ||
|
||
Trace.WriteLine($"DELETE {_blobContainer.Uri}/{name}"); | ||
Trace.WriteLine($"INFO Deleted { _blobContainer.Uri}/{name} for {RootFolder}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need 2x trace here?
@@ -370,7 +392,7 @@ protected override void PerformIndexItems(IEnumerable<ValueSet> values, Action<I | |||
public void EnsureIndex(bool forceOverwrite) | |||
{ | |||
if (!forceOverwrite && _exists.HasValue && _exists.Value) return; | |||
|
|||
if (_directory is ExamineDirectory examineDirectory && examineDirectory.IsReadOnly) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we keep this idea of ExamineDirectory then this check should be encapsulated in a protected property IsReadOnly since this same check if (_directory is ExamineDirectory examineDirectory && examineDirectory.IsReadOnly) return;
is done a few times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Shazwazza as pointed earlier I am not sure either about that :) but didnt removed that from my pr to this pr. :)
if(_directory is ExamineDirectory examineDirectory) | ||
{ | ||
//Calling commit causes fdt,fdx,fnm,frq,nrm,prx,tii,tis,tvd,tvf,tvx files to be deleted. | ||
//TODO: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this code comment still relevant? Commit shouldn't remove files IIRC, that is done when merging occurs
@@ -1031,7 +1100,14 @@ public void ScheduleCommit() | |||
if (_index._cancellationTokenSource.IsCancellationRequested) | |||
{ | |||
//perform the commit | |||
_index._writer?.Commit(); | |||
if (_index._writer is ExamineIndexWriter examineIndexWriter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This same check is performed a lot, would be better to encapsulate this logic so there's no duplication.
{ | ||
//TODO: if readonly index, unlock the local index? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would assume so yes but the IndexWriter in that case should already just be the local index?
} | ||
catch (Exception e) | ||
{ | ||
//It's the initial call to this at the beginning or after successful commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This try/catch seems significant, was this caused somewhere during debugging? Should this try/catch be ported individually to another PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Shazwazza after refactor it is actually redundant as it was happening only in readonly, but I changed way of handling read only to never occur again, but left that as in time of debugging I hit issue when old reader was already disposed, and new one was not ready yet, and adding that try catch up was fixing about 10-15% of my crashes during debugging :)
@Shazwazza some of that comments will be already addressed if @nzdev will merge my PR to this PR :) |
Expanding on the great work by @bielu.
What?
Refactors AzureDirectory related classes to be more open for extension and deduplicates code.
Upgrades the Azure blob SDK to v12 as v11 is obsolete. This required changing the framework version for AzureDirectory, Tests and Demo projects to .NET 4.6.1
Why?
Make the Azure Blob Storage Directory easier to troubleshoot so it can be made more reliable.
How to test?
Run the Azurite emulator or azure storage emulator
Run all the tests in the test project.