Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Azure Storage SDK to v12, add logging for Azure Blob storage directory #198

Closed
wants to merge 48 commits into from

Conversation

nzdev
Copy link
Contributor

@nzdev nzdev commented Dec 4, 2020

Expanding on the great work by @bielu.

What?
Refactors AzureDirectory related classes to be more open for extension and deduplicates code.
Upgrades the Azure blob SDK to v12 as v11 is obsolete. This required changing the framework version for AzureDirectory, Tests and Demo projects to .NET 4.6.1

Why?
Make the Azure Blob Storage Directory easier to troubleshoot so it can be made more reliable.

How to test?
Run the Azurite emulator or azure storage emulator
Run all the tests in the test project.

Arkadiusz Biel and others added 25 commits November 8, 2020 00:02
…ob-v1

# Conflicts:
#	src/Examine.AzureDirectory/AzureDirectory.cs
#	src/Examine.AzureDirectory/Properties/AssemblyInfo.cs
#	src/Examine/Properties/AssemblyInfo.cs
…ons. Refactor to simplify code/ improve code reuse. Change framework to 4.6.1 for azuredirectory, test and demo projects as required by blob storage package
…h namespace.

Integration test is not very reliable with the azure storage emulator.
@bielu
Copy link

bielu commented Dec 4, 2020

@nzdev Great Job about update of blob provider, but in same Time I dont really agree we should include logger, as Examine is low API abstraction, and logging should be on layer which use that, not inside of low level abstraction :)

@nzdev
Copy link
Contributor Author

nzdev commented Dec 4, 2020

Thanks for the feedback @bielu. I hope we can work together to get azuredirectory working well 😁.

I do think adding logging is appropriate. I've made it so it's opt in to provide the logger instance. However I believe it's important to have logging within this project as the exceptions are handled within this layer. Without logging at this layer it's going to be difficult to know what happens when things go wrong.

@Shazwazza
Copy link
Owner

I have reviewed #197 and seems to me that those changes will need to be reviewed and merged before I can look at this one? There's a boat load of changes here and just don't want to duplicate a lot of work. Are you guys ok with that? I'll focus on #197 first before this?

@nzdev
Copy link
Contributor Author

nzdev commented Dec 15, 2020

@Shazwazza I've done a bit of refactoring in this branch to clean things up, but any logic issues/ comments required would be better in the other branch. I'm not sure if there would need to be two releases of azure directory as .net4.5 is not supported in the current azure sdk.

@Shazwazza
Copy link
Owner

azure directory hasn't been released for v1 so we can do whatever needs to be done :) as for back porting it to 0.x that may be a diff story but we'll keep these changes for v1.

@nzdev
Copy link
Contributor Author

nzdev commented Dec 15, 2020

In that case @Shazwazza I would recommend going with this branch as I've cleaned up and fixed a few things

@bielu
Copy link

bielu commented Dec 15, 2020

@nzdev I am actually also doing cleanup and refactors and now we can finish with total different outcomes :), Can you send same PR to me and then I will merge codes and we will finish with PR instead of 2, which will make job for @Shazwazza much easier?

@nzdev
Copy link
Contributor Author

nzdev commented Dec 15, 2020

If you are on .net 4.6. @bielu It may be worth branching off this branch instead as it's already cleaned up and fixes some of the things pointed out by @Shazwazza. Otherwise, happy to send a pr your way too.

@bielu
Copy link

bielu commented Dec 15, 2020

Will look over weekend into that :)

@bielu
Copy link

bielu commented Dec 18, 2020

@nzdev I just went through your fork of mine fork, and I am going to comment there a lot off.

Copy link

@bielu bielu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is few comments which I had time to prepare, I still think idea of manifest is wrong or least wrongly implement. In loadbalanced environment you dont want be checking on flag, as it will cause delays, my approach with checking on that on request, causing that to be almost realtime.
I am going to build beta package from that branch and check if it is working and not breaking on loadbalanced environment and compare how much difference there is on time of reaction.

src/Examine/LuceneEngine/Directories/ExamineDirectory.cs Outdated Show resolved Hide resolved
}
public Lucene.Net.Store.Directory CacheDirectory { get; protected set; }

public abstract string[] CheckDirtyWithoutWriter();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above


public abstract string[] CheckDirtyWithoutWriter();

public abstract void SetDirty();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

public class ExamineIndexWriter : IndexWriter
{
private ExamineDirectory _examineDirectory;
public ExamineIndexWriter(Directory d, Analyzer a, MaxFieldLength mfl) : base(d, a, mfl)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so that creation make sense.

src/Examine/LuceneEngine/Providers/LuceneIndex.cs Outdated Show resolved Hide resolved
@@ -78,13 +93,26 @@ public class LuceneIndex : BaseIndexProvider, IDisposable, IIndexStats
DefaultAnalyzer = analyzer ?? new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30);

_directory = luceneDirectory;
if (luceneDirectory is ExamineDirectory dir)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

src/Examine/LuceneEngine/Providers/LuceneIndex.cs Outdated Show resolved Hide resolved
@@ -422,7 +469,14 @@ public void EnsureIndex(bool forceOverwrite)

//remove all of the index data
_writer.DeleteAll();
_writer.Commit();
if(_writer is ExamineIndexWriter examineIndexWriter)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still dont think so idea of Custom writer and manifest make sense at all.

}

public override void DeleteFromIndex(IEnumerable<string> itemIds)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should canceled operation in OnDocumentWriting instead.

@@ -71,13 +71,13 @@ protected ValueSetValidationResult ValidateItem(ValueSet item)
/// Validates the items and calls <see cref="M:Examine.Providers.BaseIndexProvider.PerformIndexItems(System.Collections.Generic.IEnumerable{Examine.ValueSet})" />
/// </summary>
/// <param name="values"></param>
public void IndexItems(IEnumerable<ValueSet> values)
public virtual void IndexItems(IEnumerable<ValueSet> values)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we will do as comments above we dont need override that methods, but I agree they should be virtual anyway :)

@Shazwazza
Copy link
Owner

Hi all, sorry haven't had time to follow up on this just yet (still getting back from holiday mode!)

But I had a thought and I'm unsure what the solution is. The https://www.nuget.org/packages/UmbracoFileSystemProviders.Azure/ media blob storage package uses the v11 version. Since this is quite a populate package and the only non-cloud official media blob storage package around we need to ensure that things work side by side.

I'm unsure if the v11 and v12 packages can co-exist, anyone know? Else there's an ugly ILMerge route that could be taken, else I think that media blob package will need to be updated too and this could only exist alongside that newer version.

Thoughts?

@bielu
Copy link

bielu commented Jan 6, 2021

@Shazwazza v11 and v12 from that what I know can coexist, but it is not recommend and we should probably help update package to not use deprecated apis anyway. (It is based on that they are in different namespace(with and without Microsoft)
anyway we still waiting for @nzdev to review my pr to his pr :)
https://github.com/nzdev/Examine/pull/1/files

@nzdev
Copy link
Contributor Author

nzdev commented Jan 9, 2021

@bielu
Copy link

bielu commented Jan 16, 2021

@Shazwazza after few messages with @nzdev we figure out the best way of handling all providers for remote directory is to abstract out APIs for Azure/S3 etc can you maybe say which approach you prefer to do move abstractions?

  1. Move Renamed Dir to main Examine Package and later in Examine.AzureDirectoryhave only parts needed for azure?
  2. Separate abstraction to 3 package Examine.RemoteDir and use that as depedency in Examine.AzureDirectory

@Shazwazza
Copy link
Owner

Hi all,

can you maybe say which approach you prefer to do move abstractions?

At this stage I think you both know more about the requirements then I do since my head hasn't been in this part of the codebase as much as you have recently. Having a separate 3rd package seems like a 'cleaner' solution and also means that if changes are needed within that package it can be deployed independently of an Examine version. But happy to take your lead, what do you think?

Copy link
Owner

@Shazwazza Shazwazza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have left lots of comments/questions inline. I'm unsure about the ExamineIndexWriter change. It seems that all logic that this is dealing with really has nothing to do with the default LuceneEngine. Things like dirty checking, etc... are all to do with specific implementations of LuceneIndex. Seems like it would make more sense to have an inherited class deal with that stuff that is specific to 'Remote' directories.

catch(Azure.RequestFailedException ex) when (ex.Status == 409)
{
//File already exists
throw;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to catch this and then just re-throw? Or is this just for debugging if breakpointing?


public Lucene.Net.Store.Directory CacheDirectory { get; protected set; }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be protected set? This is a ctor dependency so should be readonly?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Shazwazza it can't be readonly as it is used an swapped by reaonly, it is why it is protected. if you check there logic:

CacheDirectory = new SimpleFSDirectory(directory);

To avoid corruption of indexes :)

Trace.WriteLine($"INFO Syncing file {fileName} for {RootFolder}");
// then we will get it fresh into local deflatedName
// StreamOutput deflatedStream = new StreamOutput(CacheDirectory.CreateOutput(deflatedName));
using (var deflatedStream = new MemoryStream())
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deflatedStream here isn't the correct name, it's just stream that the blob is downloaded too. I realize this was a copy/paste from other code but should be renamed, even just memStream or whatever is fine.

blob.DeleteIfExists();
SetDirty();

Trace.WriteLine($"DELETE {_blobContainer.Uri}/{name}");
Trace.WriteLine($"INFO Deleted { _blobContainer.Uri}/{name} for {RootFolder}");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need 2x trace here?

@@ -370,7 +392,7 @@ protected override void PerformIndexItems(IEnumerable<ValueSet> values, Action<I
public void EnsureIndex(bool forceOverwrite)
{
if (!forceOverwrite && _exists.HasValue && _exists.Value) return;

if (_directory is ExamineDirectory examineDirectory && examineDirectory.IsReadOnly) return;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we keep this idea of ExamineDirectory then this check should be encapsulated in a protected property IsReadOnly since this same check if (_directory is ExamineDirectory examineDirectory && examineDirectory.IsReadOnly) return; is done a few times.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Shazwazza as pointed earlier I am not sure either about that :) but didnt removed that from my pr to this pr. :)

if(_directory is ExamineDirectory examineDirectory)
{
//Calling commit causes fdt,fdx,fnm,frq,nrm,prx,tii,tis,tvd,tvf,tvx files to be deleted.
//TODO:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this code comment still relevant? Commit shouldn't remove files IIRC, that is done when merging occurs

@@ -1031,7 +1100,14 @@ public void ScheduleCommit()
if (_index._cancellationTokenSource.IsCancellationRequested)
{
//perform the commit
_index._writer?.Commit();
if (_index._writer is ExamineIndexWriter examineIndexWriter)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This same check is performed a lot, would be better to encapsulate this logic so there's no duplication.

{
//TODO: if readonly index, unlock the local index?
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would assume so yes but the IndexWriter in that case should already just be the local index?

}
catch (Exception e)
{
//It's the initial call to this at the beginning or after successful commit
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This try/catch seems significant, was this caused somewhere during debugging? Should this try/catch be ported individually to another PR?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Shazwazza after refactor it is actually redundant as it was happening only in readonly, but I changed way of handling read only to never occur again, but left that as in time of debugging I hit issue when old reader was already disposed, and new one was not ready yet, and adding that try catch up was fixing about 10-15% of my crashes during debugging :)

@bielu
Copy link

bielu commented Jan 25, 2021

@Shazwazza some of that comments will be already addressed if @nzdev will merge my PR to this PR :)
https://github.com/nzdev/Examine/pull/1/files

@nzdev nzdev closed this Jun 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants