Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion GroupsVB SyntaxEnterprise DevelopmentDatabase AccessControlsCOMWin APICrystal ReportDeploymentGeneralGeneral 2
Related Topics
VB.NET / ASP.NETMS SQL ServerMS AccessOther Database ProductsMore Topics ...

VB Forum / Win API / April 2007



Tip: Looking for answers? Try searching our database.

Do Windows files have Unique IDs? Can they be retrieved by code?

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
YisMan - 23 Mar 2007 11:28 GMT
Hi Everyone,

I keep an Access database of all my song files, along with their respective
atributes. I want to be able to make changes to their attributes/properties
in the DB and use code to update them in Windows. As the names and hence the
full paths of these files may fall out of sync with the DB, I would like to
know this:

Does Windows keep a unique ID for each file in the file system throughout
its lifetime which will never change? and if yes, how will I retrieve it by
code so I can keep it in my DB in the file's record?

I checked the basic properties of the FileSystemObject as well as the
documentation of WMI, I haven't found anything yet.

Any ideas/suggestions, anyone?
Thankfully, YisMan
Dave O. - 23 Mar 2007 11:42 GMT
> Hi Everyone,

> I checked the basic properties of the FileSystemObject as well as the
> documentation of WMI, I haven't found anything yet.

That's because there is nothing to find. Think about it for a moment, there
are an infinite number of potential files but if you are using a long only a
little over 4 billion possible numbers, so it is not possible for every file
to have a unique number, nor is there any useful reason to have one. You can
generate a CRC for any file, this is a number which can be used to check the
file for corruption as it changes if the file changes but for a quantity of
large files such as media it would take too long to calculate and would
change if the files tag was edited

If you want to avoid the database losing sync with the files, put the
editing tools into the database front end and have it update the file & the
database together.

If you can read the headers of your media files and know how long the
tag/header is, you should be able to grab a K or 2 of the file which will be
the same regardless of the size of the tag or header, you can then take a
CRC of this excerpt and put that into your database then the program can
identify the file even if its name and location are changed and the tag
edited.

Best Regards
Dave O.
YisMan - 30 Mar 2007 08:20 GMT
Hi Dave,

Thank you very much for your detailed explanation, I found it very edifying.
I do find it very strange that Windows doesn't have a unique identifier for
each file in it's own internal database. How does Windows keep track of the
ever changing metadata of it's files, such as name, path, date modified etc.
if it doesn't have some ID key to reference?

Now to my problem. I never heard of CRC's. I did some research on the web on
the subject, it's way beyond my algebra. Even if I'll find a ready snippet, I
think it'll take forever for my code to iterate through a few thousand files
checking each one's CRC to see if that's the one that needs to be updated.

What you write about keeping the editing tools in the DB front, great idea,
actually I've already done that, to some degree. What's odd here is that the
lyrics, which are stored in Windows, cannot be accessed through the
FileSystemObject, as far as I tried. They can only be accessed through the
WMP.GetItem.Info method. Shouldn't Windows expose the extended properties
which it stores? Am I missing something?

Thanks again for your clear and detailed reply. It's a pleasure having such
helpful people around.

Signature

Thankfully, YisMan  BS"D

> > Hi Everyone,
>
[quoted text clipped - 23 lines]
> Best Regards
> Dave O.
J French - 30 Mar 2007 08:56 GMT
On Fri, 30 Mar 2007 00:20:02 -0700, =?Utf-8?B?WWlzTWFu?=
<yisman@att.net> wrote:

>Hi Dave,
>
[quoted text clipped - 36 lines]
>> editing tools into the database front end and have it update the file & the
>> database together.

>> If you can read the headers of your media files and know how long the
>> tag/header is, you should be able to grab a K or 2 of the file which will be
>> the same regardless of the size of the tag or header, you can then take a
>> CRC of this excerpt and put that into your database then the program can
>> identify the file even if its name and location are changed and the tag
>> edited.

Files are just chunks of data on disk

Their Name, dates, size and location are stored in a Directory Entry
that also points to the start of the file.

Certain programs and APIs know about the /expected/ internal format of
specific types of files and can go in to fetch data

www.wotsit.org  has information about file formats

As for CRCs, I would take the CRC of the entire file and even if I
found two CRCs the same I would not assume that the contents are
identical.

There is a small, but very real possibility that two different chunks
of data will have the same CRC
Dave O. - 02 Apr 2007 10:14 GMT
> As for CRCs, I would take the CRC of the entire file and even if I
> found two CRCs the same I would not assume that the contents are
> identical.
>
> There is a small, but very real possibility that two different chunks
> of data will have the same CRC

There is a problem with media files where tagging information can be added,
removed or edited but the actual media part of the file is unchanged. In
these cases a CRC of the whole file would report the file as different when
you want it to be the same. If you take a CRC of just the media part you
eliminate that potential problem. By choosing a chunk of media file which
you can always find regardless of any header or tag (or absence thereof)
means that the same media will return the same CRC.

There is no good reason getting the CRC of possibly 20 meg for a long high
quality MP3 file when a few k would give you just as much of a unique
identification.

What you say about the possibility of different files sharing a CRC is very
true and eventually inevitable. Using a chunk offers a way to ameliorate
this problem by designating 2 chunks from very different positions, getting
the CRC for both and then if one matches use the second as extra validation.
The chance of both CRCs matching for different files while existing is
preposterously low and can be disregarded for anything that is not
life-critical.

Regards

Dave O.
Dave O. - 02 Apr 2007 10:14 GMT
> Hi Dave,
>
[quoted text clipped - 15 lines]
> files
> checking each one's CRC to see if that's the one that needs to be updated.

Here is the code to provide a standard CRC -  All in a module

Private CRCTable(255)   As Long

Public Sub InitCRC()
Dim dwPolyN As Long
Dim i       As Integer
Dim j       As Integer
Dim dwCRC   As Long

' Fill lookup table for CRC calculation - This sub is called once on loading
dwPolyN = &HEDB88320
For i = 0 To 255
 dwCRC = i
 For j = 8 To 1 Step -1
   If (dwCRC And 1) Then
     dwCRC = ((dwCRC And &HFFFFFFFE) \ 2&) And &H7FFFFFFF
     dwCRC = dwCRC Xor dwPolyN
   Else
     dwCRC = ((dwCRC And &HFFFFFFFE) \ 2&) And &H7FFFFFFF
   End If
 Next j
 CRCTable(i) = dwCRC
Next i

End Sub

Public Function GetCRCStr(Fnt As String) As Long
Dim b()         As Byte
Dim lp          As Long
Dim CRC32Result As Long
Dim iLookup     As Integer
Dim fBuff       As String
Dim ff          As Long

fBuff = Fnt
ff = Len(Fnt)
CRC32Result = &HFFFFFFFF
ReDim b(ff)
b = StrConv(fBuff, vbFromUnicode)
For lp = 0 To ff - 1
 iLookup = (CRC32Result And &HFF) Xor b(lp)
 CRC32Result = ((CRC32Result And &HFFFFFF00) \ &H100) And 16777215
' nasty shr 8 with vb :/
 CRC32Result = CRC32Result Xor CRCTable(iLookup)
Next
GetCRCStr = Not CRC32Result

End Function

''*** This was found on a site a long time ago - I can't remember where but
''*** whoever wrote it - many thanks.
YisMan - 04 Apr 2007 12:40 GMT
I thank you all for the help, I'm afraid it's getting too complicated for my
hobby project. Meanwhile I'm managing keeping my path's up to date. If it
will get out of hand I guess I'll resort to CRC's.
Meanwhile G-d bless you all.
Signature

Thankfully, YisMan

> > Hi Dave,
> >
[quoted text clipped - 68 lines]
> ''*** This was found on a site a long time ago - I can't remember where but
> ''*** whoever wrote it - many thanks.
Steve - 19 Apr 2007 20:54 GMT
> What you write about keeping the editing tools in the DB front, great idea,
> actually I've already done that, to some degree. What's odd here is that the
> lyrics, which are stored in Windows, cannot be accessed through the
> FileSystemObject, as far as I tried. They can only be accessed through the
> WMP.GetItem.Info method. Shouldn't Windows expose the extended properties
> which it stores? Am I missing something?

YisMan,

I know this thread is a bit old but if you or anyone else finds it and
is interested, the reason you can not find the meta-data (ie. lyrics)
for WMP managed files in the file itself is because WMP stores some
data only in it's own database.  Somewhere there is a .chm (help file)
for the WMP object model that specifies which data is stored in the
files tag and the database or just in the database.

Hope this helps,
Steve
YisMan - 19 Apr 2007 22:24 GMT
Thank you very much, Steve.

Yes, I am still very interested in the subject. Can you tell me more about
this? What I'd like to know is:

A) Is there a way I could get/set the File's own properties w/o the WMP
object. Actually as per an answer I got from Alessandro Angeli, I downloaded
the Windows Media Format SDK. Unfortunately, for the time being I am still
struggling to convert the COM wrapper from C# to VB.NET (I'm illiterate in C.
Alessandro, if you see this, and have a VB translation I'd be grateful for
it). In any case, it is not the simplest object model. If you have a simpler,
more intuitive way of doing this, I'd love to hear about it.

B) Where is this database that WMP uses? Can I get a hold of it? can I
change/read the info therin? back it up?

Thank you very much, YisMan

> > What you write about keeping the editing tools in the DB front, great idea,
> > actually I've already done that, to some degree. What's odd here is that the
[quoted text clipped - 14 lines]
> Hope this helps,
> Steve
Steve - 20 Apr 2007 13:57 GMT
> Thank you very much, Steve.
>
[quoted text clipped - 34 lines]
>
> - Show quoted text -

YisMan,

It sounds as if you are doing something very similar to what I did a
few years back...and running into the same difficulties.

I found the WMP library to be such a PIA to work with that I abandoned
it all together and created my own DB and playlist management system.
The system I created uses ID3 ver. 2 tags to store the extended info
about my mp3 files.  I found a class on Mike Suttons web site (http://
EDais.mvps.org/) that handles both ID3 version 1 and version 2 tags.
I however am not trying to store the lyrics just simple data such as
artist, recorded date etc.

As for keeping my DB insync with the actual files I simply do (as a
previous poster suggested) modify both the file and the DB at the same
time.  This approach ofcoarse assumes that no other mechanisim is used
to edit the file data.  Since my app is just a jukebox/media library
app for my own personal use I can be sure that this wont happen..

Hope this helps,
Steve
Tony Proctor - 20 Apr 2007 15:14 GMT
I think what you're looking for YisMan is the "File Index". This can be
obtained from the BY_HANDLE_FILE_INFORMATION structure, via
GetFileInformationByHandle. The file index is bigger than a long but if you
represent it in textual form then you can still use it as a unique key. For
instance...

Const INVALID_HANDLE_VALUE = -1

Const OPEN_EXISTING = 3

Private Declare Function CloseHandle Lib "kernel32" ( _
   ByVal hObject As Long) As Long

Private Declare Function CreateFile Lib "kernel32" Alias "CreateFileA" ( _
   ByVal lpFileName As String, _
   ByVal dwDesiredAccess As Long, ByVal dwShareMode As Long, _
   ByVal lpSecurityAttributes As Long, _
   ByVal dwCreationDisposition As Long, _
   ByVal dwFlagsAndAttributes As Long, ByVal hTemplateFile As Long) As Long

Private Declare Function GetFileInformationByHandle Lib "kernel32" ( _
   ByVal hFile As Long, lpFileInformation As BY_HANDLE_FILE_INFORMATION) As
Long

Private Type FILETIME
   dwLowDateTime As Long
   dwHighDateTime As Long
End Type

Private Type BY_HANDLE_FILE_INFORMATION
   dwFileAttributes As Long
   ftCreationTime As FILETIME
   ftLastAccessTime As FILETIME
   ftLastWriteTime As FILETIME
   dwVolumeSerialNumber As Long
   nFileSizeHigh As Long
   nFileSizeLow As Long
   nNumberOfLinks As Long
   nFileIndexHigh As Long
   nFileIndexLow As Long
End Type

Private tInfo As BY_HANDLE_FILE_INFORMATION 'Current file information

Private Function sGetFileInfo(sFile As String) As String
' Fill out tInfo with file information. Returns an error string if this
fails, e.g.
' if the file doesn't exist.
Dim hFile As Long

   ' Open the file to get attributes (no I/O intended)
   hFile = CreateFile(sFile, 0, 0, 0, OPEN_EXISTING, 0, 0)
   If hFile = INVALID_HANDLE_VALUE Then
       sGetFileInfo = sGetErrMessage()
       Exit Function
   End If
   ' Read the unique file index, and the file's modifications date/time
   If GetFileInformationByHandle(hFile, tInfo) = 0 Then
       sGetFileInfo = sGetErrMessage()
   End If
   CloseHandle hFile
End Function

Private Function sFileIndex() As String
' Returns the unique file index for the current file as 16-digit hex string
(i.e. 64
' bits formatted as hex)

   sFileIndex = Right$("0000000" & Hex(tInfo.nFileIndexHigh), 8) & _
       Right$("0000000" & Hex(tInfo.nFileIndexLow), 8)
End Function

   Tony Proctor

> Hi Everyone,
>
[quoted text clipped - 13 lines]
> Any ideas/suggestions, anyone?
> Thankfully, YisMan
Steve - 20 Apr 2007 15:48 GMT
> I think what you're looking for YisMan is the "File Index". This can be
> obtained from the BY_HANDLE_FILE_INFORMATION structure, via
[quoted text clipped - 94 lines]
>
> - Show quoted text -

Tony,

When does this index value get changed?

Lets assume I have two similar (but not exact) versions of the same
mp3 file with the same name but located in two different folders.  We
will call the first file "A" and the second file "B".
The mp3 tag data in my DB is correct and matches that of file "A" but
file "A" also has some data errors in the media portion so I want to
replace it with file "B".

To keep my DB insync with the file I need the result to be that the
copied file ("B") assumes the index of the existing file ("A").  I
assume however that the file would maintain its original index.  Or
will a completely new index be generated?

Either case is managable, provided I know which way it works and
provided that my app is aware of the switch.

Thanks,
Steve
Tony Proctor - 20 Apr 2007 16:22 GMT
The file index uniquely identifies the file, but not the file content.
Hence, a file keeps the same file index, even after modifications to its
content

If you, or the OP, is keeping a catalog of file information then the file
index can be used as a key to relate those catalog records to the relevant
files that. You can also use it to spot new or deleted files and so keep the
catalog in step with directory changes.

If you wanted to know when the file contents had changed, though, then you
would have to check the last-modified time in this same file information
structure (see ftLastWriteTime field)

   Tony Proctor

> > I think what you're looking for YisMan is the "File Index". This can be
> > obtained from the BY_HANDLE_FILE_INFORMATION structure, via
[quoted text clipped - 116 lines]
> Thanks,
> Steve
John - 21 Apr 2007 03:05 GMT
> The file index uniquely identifies the file, but not the file content.
> Hence, a file keeps the same file index, even after modifications to its
[quoted text clipped - 151 lines]
>
> - Show quoted text -

I've found that an MD5 of the data is much faster than a CRC32.  Your
approach of storing the filename/filepath/filetime/filesize is also a
good way of trying to verify that the file is unchanged.  The only
problem is that any change to the embedded tags will change the file
information without changing the music content.  I've never really
figured out a good/fast way to be *absolutely* positive that there
haven't been any changes.  There are so many different possible tag
standards, and files are often tagged in a format that is invalid for
the particular filetype!

You might be interested in takin a look at MP3-Boss -- an Access
database (now 8 years in the making!) that tries to manage all this.
Originally, I figured it was a 6month project!
http://www.mp3-boss.com
Tony Proctor - 23 Apr 2007 11:07 GMT
You don't really need the file path or size John. The last-modified time
will tell you whether any change was made to the file content, although it
cannot distinguish between changes to tag-content and to music-content.
Also, if the same data was written back (i.e. no net change) then it may
look like there was a change when there wasn't. This is where a CRC or MD5
might be better.

The use of the file-index is better than relying on the file name/path since
it references the file body. The name/path are merely entries in one-or-more
directory files, and the same file body can even have multiple directory
entries pointing to it. Hence, the file-index is unique. It's also
unaffected by a rename of the file - which is quite common with music files

   Tony Proctor

> > The file index uniquely identifies the file, but not the file content.
> > Hence, a file keeps the same file index, even after modifications to its
[quoted text clipped - 166 lines]
> Originally, I figured it was a 6month project!
> http://www.mp3-boss.com
John - 25 Apr 2007 14:12 GMT
On Apr 23, 6:07 am, "Tony Proctor"
<tony_proctor@aimtechnology_NoMoreSPAM_.com> wrote:
> You don't really need the file path or size John. The last-modified time
> will tell you whether any change was made to the file content, although it
[quoted text clipped - 205 lines]
>
> - Show quoted text -

At one time I had tried using the FileIndex to uniquely identify files
-- I was particularly interested in identifying a CD based on the
FileIndex -- but if you can believe it, the FileIndex for the files
changed every time you'd write to the CD (even though I was only
adding files to the CD).

I vaguely recall that the FileIndex for a hard drive would change
every time you'd reboot the computer (or maybe it was if you renamed
the volume?).  My main interest was to have a unique ID for a CD
though...so unfortunately I didn't write down my findings!  Have you
verified that the FileIndex is unique even after rebooting the
computer?  Also, I believe the FileIndex changes for network mapped
drives if the file is closed and then reopened, and also if you move
the file across volumes.

I don't find a lot of information out there regarding the limitations
of FileIndex, but I found this on a search -- this is probably what I
was seeing "The FileIndex is a 64-bit number that indicates the
position of the file in the Master File Table (MFT).  It is stable
between successive starts of the system, provided the MFT does not
overflow and therefore has to be rebuilt."

So...the problem with FileIndex is that it can't quite be counted on
as a permanent unique identifier.  Maybe in Vista?

John
Tony Proctor - 25 Apr 2007 14:39 GMT
The 'file index' is part of the internal disk and file-system organisation
John. For a hard-drive, it doesn't change if you reboot the system, or if
you rename the file. I haven't tried renaming the volume but I would expect
it to be unchanged there too since the disk structure remains intact. If you
find you have to rebuild your MFT then you will have had a very serious
computer failure, and your file IDs will be the least of your worries
(needless to say, it very rarely happens)

If you copy a file from one place to another then the copy will have a
different index - because it's a different file as far as the O/S is
concerned. Similarly, if you reburn a CD then you're changing the disk
structure, and writing entirely new files (even if the names and contents
were the same as before)

We use file indexes a lot here for cache entries, which is just another type
of file catalog. If we need to process a file, say, to compile a
memory-based version of its storage, we can reliably tell whether we already
have it loaded by simply looking up the file index in a Collection or
Dictionary object. We can then test the file-modified time to see if the
contents were modified since the time we loaded it. If so then it can be
re-loaded at that point and update our stored copy of the file-modified
time.

   Tony Proctor

> On Apr 23, 6:07 am, "Tony Proctor"
> <tony_proctor@aimtechnology_NoMoreSPAM_.com> wrote:
[quoted text clipped - 234 lines]
>
> John
Karl E. Peterson - 25 Apr 2007 18:53 GMT
> If you copy a file from one place to another then the copy will have a
> different index - because it's a different file as far as the O/S is
> concerned.

What about a defrag?
Signature

.NET: It's About Trust!
http://vfred.mvps.org

Tony Proctor - 25 Apr 2007 19:19 GMT
Good question Karl. I'm not entirely sure as a defrag moves the individual
blocks in the file around the disk rather than the copying the whole file
body around. Hence, I believe it still stays fixed but I haven't tested that

   Tony Proctor

> > If you copy a file from one place to another then the copy will have a
> > different index - because it's a different file as far as the O/S is
> > concerned.
>
> What about a defrag?
Karl E. Peterson - 25 Apr 2007 19:26 GMT
> "Karl E. Peterson" <karl@mvps.org> wrote...
>>> If you copy a file from one place to another then the copy will have a
[quoted text clipped - 6 lines]
> blocks in the file around the disk rather than the copying the whole file
> body around. Hence, I believe it still stays fixed but I haven't tested that

Makes sense that they'd stay intact, but when "makes sense" is the best one has to
go on...  Well, we all "been there" right? <g>  Otherwise, these sound like pretty
cool IDs to know about!
Signature

.NET: It's About Trust!
http://vfred.mvps.org

Jim Mack - 25 Apr 2007 20:02 GMT
>> "Karl E. Peterson" <karl@mvps.org> wrote...
>>>> If you copy a file from one place to another then the copy will
[quoted text clipped - 13 lines]
> sound like pretty
> cool IDs to know about!

Since files below a certain size may (at the option of the FS) be kept entirely in the MFT, I suspect that a defrag is going to upset the index values, in at least some cases. And one case is enough. :-)

And of course FAT file systems, including FAT32, don't have a MFT and so no way to persist any index.

Signature

       Jim

Karl E. Peterson - 25 Apr 2007 20:07 GMT
>>> "Karl E. Peterson" <karl@mvps.org> wrote...
>>>>> If you copy a file from one place to another then the copy will
[quoted text clipped - 20 lines]
> And of course FAT file systems, including FAT32, don't have a MFT and so no way
> to persist any index.

Yeah, I guess it all goes back to the docs, huh?  They recommend using these only as
a way to compare whether two existing file handles point to the same file.  (Btw
Tony, wouldn't you want to combine the volume label with your key string?)
Signature

.NET: It's About Trust!
http://vfred.mvps.org

Tony Proctor - 25 Apr 2007 21:05 GMT
> Tony, wouldn't you want to combine the volume label with your key string?

Probably Karl. I assumed the files being checked were all on the same
volume, but otherwise 'yes'

   Tony Proctor

> >>> "Karl E. Peterson" <karl@mvps.org> wrote...
> >>>>> If you copy a file from one place to another then the copy will
[quoted text clipped - 24 lines]
> a way to compare whether two existing file handles point to the same file.  (Btw
> Tony, wouldn't you want to combine the volume label with your key string?)
Karl E. Peterson - 25 Apr 2007 21:29 GMT
>> Tony, wouldn't you want to combine the volume label with your key string?
>
> Probably Karl. I assumed the files being checked were all on the same
> volume, but otherwise 'yes'

Yeah, for a "unique" key, that's really the only way.  I'm gonna remember this one.
:-)
Signature

.NET: It's About Trust!
http://vfred.mvps.org

Tony Proctor - 27 Apr 2007 12:54 GMT
I was reading around to get a clearer picture of the differences here Jim.
The consensus seems to be that NTFS provides "defrag-safe" file indexes
(even for small files held in the MFT), but that FAT ones are not
"defrag-safe". I haven't confirmed that myself though

   Tony Proctor

Karl E. Peterson wrote:
> Tony Proctor <tony_proctor@aimtechnology_NoMoreSPAM_.com> wrote:
>> "Karl E. Peterson" <karl@mvps.org> wrote...
[quoted text clipped - 15 lines]
> sound like pretty
> cool IDs to know about!

Since files below a certain size may (at the option of the FS) be kept
entirely in the MFT, I suspect that a defrag is going to upset the index
values, in at least some cases. And one case is enough. :-)

And of course FAT file systems, including FAT32, don't have a MFT and so no
way to persist any index.

Signature

       Jim

Jim Mack - 27 Apr 2007 14:49 GMT
> I was reading around to get a clearer picture of the differences here
> Jim. The consensus seems to be that NTFS provides "defrag-safe" file
> indexes (even for small files held in the MFT), but that FAT ones are
> not "defrag-safe". I haven't confirmed that myself though

Trouble is, without an authoritative statement from MS, it's proof by example and so always subject to instant falsification. It's the 'white crow' problem: we can infer that all crows are black, until a white one comes along. We might be able to gain certainty if the internals were documented.

I didn't follow this thread from the beginning, so I'm not sure what the goal is. Is it to uniquely identify a file, or to detect changes, or to determine the order that files were added, or something else?

Signature

       Jim

> Karl E. Peterson wrote:
>> Tony Proctor <tony_proctor@aimtechnology_NoMoreSPAM_.com> wrote:
[quoted text clipped - 23 lines]
> And of course FAT file systems, including FAT32, don't have a MFT and
> so no way to persist any index.
Tony Proctor - 27 Apr 2007 16:50 GMT
In general terms I believe was to keep a separate "file catalog" in step
with the underlying files. In effect, to have the catalog reference a unique
file identifier to keep its data in synch with the files, to be able to spot
new/deleted files using the same ID, and then using either last-modified
time or possibly CRC/MD5 to detect when the content of the said files has
been changed

   Tony Proctor

Tony Proctor wrote:
> I was reading around to get a clearer picture of the differences here
> Jim. The consensus seems to be that NTFS provides "defrag-safe" file
> indexes (even for small files held in the MFT), but that FAT ones are
> not "defrag-safe". I haven't confirmed that myself though

Trouble is, without an authoritative statement from MS, it's proof by
example and so always subject to instant falsification. It's the 'white
crow' problem: we can infer that all crows are black, until a white one
comes along. We might be able to gain certainty if the internals were
documented.

I didn't follow this thread from the beginning, so I'm not sure what the
goal is. Is it to uniquely identify a file, or to detect changes, or to
determine the order that files were added, or something else?

Signature

       Jim

> Karl E. Peterson wrote:
>> Tony Proctor <tony_proctor@aimtechnology_NoMoreSPAM_.com> wrote:
[quoted text clipped - 22 lines]
> And of course FAT file systems, including FAT32, don't have a MFT and
> so no way to persist any index.
Jim Mack - 27 Apr 2007 21:54 GMT
> In general terms I believe was to keep a separate "file catalog" in
> step with the underlying files. In effect, to have the catalog
> reference a unique file identifier to keep its data in synch with the
> files, to be able to spot new/deleted files using the same ID, and
> then using either last-modified time or possibly CRC/MD5 to detect
> when the content of the said files has been changed

Ah. Well, I sure wouldn't rely on these indexes for anything critical, but maybe it's enough for this application.

It's easy enough to just keep an array of directory contents, with a hash of file size and CRC of the contents (or a portion).

Signature

       Jim

> Tony Proctor wrote:
>> I was reading around to get a clearer picture of the differences here
[quoted text clipped - 38 lines]
>> And of course FAT file systems, including FAT32, don't have a MFT and
>> so no way to persist any index.
Tony Proctor - 25 Apr 2007 19:47 GMT
Aha, I think you may be getting confused by the MSDN documentation for the
nFileIndexHigh/nFileIndexLow structures John. This has confused other people
too (e.g.
http://groups.google.ie/group/microsoft.public.vb.winapi/browse_frm/thread/9f258
cadf993c8b2/dd8cad3dd42df5e6?hl=en#dd8cad3dd42df5e6
).
It suggests that the file index may change on a system boot, or after
closing and re-opening a file.  Sigh....

Having worked on file system development I know how file indexes and inodes
are used. I think what that MSDN sentence is basically saying that when you
open a file by name, you then have access to its unique file-index, and that
remains fixed until you close it. This is because you're pointing to the
file body at that point, and anything that happens to the associated
directory entry (e.g. deleted, re-created) is then irrelevant. However, if
you close the file and re-open it -- again, by name -- then you may have
opened a difference instance of the file (where it's been physically
re-created) and so might see a different file-index. This is very misleading
though since the file name (or path) never unique identifies a file, and
everyone appreciates that you may delete a file, and then create a whole new
file with the same name. That's exactly why file indexes are so useful. The
file indexes for the old and new files in this scenario would be different,
and so you would know it's a different file even though someone has given it
the same name.

   Tony Proctor

> On Apr 23, 6:07 am, "Tony Proctor"
> <tony_proctor@aimtechnology_NoMoreSPAM_.com> wrote:
[quoted text clipped - 234 lines]
>
> John
John - 25 Apr 2007 15:09 GMT
On Apr 23, 6:07 am, "Tony Proctor"
<tony_proctor@aimtechnology_NoMoreSPAM_.com> wrote:
> You don't really need the file path or size John. The last-modified time
> will tell you whether any change was made to the file content, although it
[quoted text clipped - 205 lines]
>
> - Show quoted text -

Tony,

I had taken a look at FileIndex as a unique identifier, but decided
that it really wasn't a permanent unique identifier.

I was specifically interested in using the FileIndex to identify files
on a CD -- but found that every time I'd add files to the CD -- the
FileIndex would change.

I found this information:
The FileIndex is a 64-bit number that indicates the position of the
file in the Master File Table (MFT).  It is stable between successive
starts of the system, provided the MFT does not overflow and therefore
has to be rebuilt. On WinNT systems (NT, 2K, XP) the FileIndex is also
returned for directories, on Win9x (95, 98, ME) it returns zero for
directories. It is not stable for files on network drives; successive
calls to GetFileInformationByHandle return different values.

The FileIndex also changes if you move the file across volumes.

During my testing (quite a long while ago!), I seem to recall that it
would also change if you renamed the volume.

Have you found FileIndex to be permanent in your testing?

John
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.