#archiveteam 2016-01-20,Wed

↑back Search

Time Nickname Message
00:02 πŸ”— PotcFdk has joined #archiveteam
00:10 πŸ”— Ghost_of_ has quit IRC (Read error: Connection reset by peer)
00:13 πŸ”— w0rp has joined #archiveteam
00:13 πŸ”— mutoso has joined #archiveteam
00:13 πŸ”— wednesday has joined #archiveteam
00:13 πŸ”— Atom-- has joined #archiveteam
00:13 πŸ”— ats_ has joined #archiveteam
00:13 πŸ”— zino has joined #archiveteam
00:13 πŸ”— z00nx has joined #archiveteam
00:13 πŸ”— nekomune has joined #archiveteam
00:13 πŸ”— ersi has joined #archiveteam
00:13 πŸ”— midas has joined #archiveteam
00:13 πŸ”— Nemo_bis has joined #archiveteam
00:13 πŸ”— Jon has joined #archiveteam
00:13 πŸ”— Rye has joined #archiveteam
00:13 πŸ”— Fletcher has joined #archiveteam
00:13 πŸ”— filippo__ has joined #archiveteam
00:13 πŸ”— espes___ has joined #archiveteam
00:13 πŸ”— useretai- has joined #archiveteam
00:13 πŸ”— will has joined #archiveteam
00:13 πŸ”— irc.underworld.no sets mode: +oo midas Nemo_bis
00:20 πŸ”— WinterFox has joined #archiveteam
00:35 πŸ”— afics has quit IRC (Read error: Operation timed out)
00:35 πŸ”— afics has joined #archiveteam
00:36 πŸ”— kyan has joined #archiveteam
00:54 πŸ”— mismatchm has quit IRC (Ping timeout: 360 seconds)
00:58 πŸ”— megaminxw has joined #archiveteam
01:28 πŸ”— megaminxw has quit IRC (Quit: Leaving.)
01:34 πŸ”— nickname has joined #archiveteam
01:35 πŸ”— nickname Has angelfire.com ever been archived?
01:35 πŸ”— nickname has quit IRC (Client Quit)
02:10 πŸ”— VADemon has quit IRC (Quit: left4dead)
02:11 πŸ”— dashcloud has quit IRC (Read error: Operation timed out)
02:12 πŸ”— dashcloud has joined #archiveteam
02:25 πŸ”— schbirid2 has joined #archiveteam
02:28 πŸ”— username1 has quit IRC (Read error: Operation timed out)
02:30 πŸ”— JesseW nickname -- is there any mention of angelfire on the wiki?
02:36 πŸ”— kyan JesseW, nickname: http://archiveteam.org/index.php?title=Angelfire
02:37 πŸ”— JesseW cool
04:00 πŸ”— brayden has joined #archiveteam
04:10 πŸ”— DFJustin a large chunk of it was run through archivebot before but the whole thing hasn't been specifically archived
04:40 πŸ”— Coderjoe has quit IRC (Read error: Connection reset by peer)
04:40 πŸ”— Coderjoe has joined #archiveteam
05:07 πŸ”— SN4T14 has joined #archiveteam
05:15 πŸ”— SN4T14 has quit IRC (Read error: Operation timed out)
05:48 πŸ”— JesseW I'm thinking about producing a file of (reported) IA file hashes, to be (hopefully) widely distributed, and regularly re-generated, as a way of providing some 3rd-party accountability for changes in the contents of IA items. I'd love some comments on both the general idea, and the specifics of the format.
05:50 πŸ”— JesseW The format I'm currently thinking of is a 2-column, tab-delimited file, with the first column being the identifer, a forward slash, and the file path, and the 2nd column being the md5 hash, as a 32 digit hex number.
05:51 πŸ”— JesseW This isn't the *most* minimal in size, but I think it's a good compromise between minimal size and simplicity
05:58 πŸ”— yipdw #internetarchive.bak is doing this to some extent
05:58 πŸ”— yipdw in the sense that git-annex needs content fingerprinting to know what's changed anyway and there's multiple copies of various shards
05:59 πŸ”— zino has quit IRC (Ping timeout: 252 seconds)
06:01 πŸ”— xmc why use md5 when there are better hashes?
06:04 πŸ”— SN4T14 has joined #archiveteam
06:05 πŸ”— JesseW xmc: Because, for whatever reason, Jake at IA only included the md5 hash in the census he did in March 2015.
06:05 πŸ”— xmc o
06:05 πŸ”— xmc ok
06:06 πŸ”— JesseW So in order to make a comparison with that, I need to use the same
06:06 πŸ”— xmc MD5 is a little bit more reliable than a CRC
06:06 πŸ”— JesseW IA currently reports md5, sha1 and crc32 hashes for its files.
06:07 πŸ”— JesseW yipdw: I know IA.BAK is doing similar (and much more so, making actual full copies of the data) -- but it doesn't (yet) cover most of IA's public collections.
06:08 πŸ”— yipdw tab is probably fine provided no identifier names contain tabs
06:09 πŸ”— xmc identifiers are normally restricted to [-_.0-9A-Za-z] iirc
06:09 πŸ”— JesseW no identifier name *or* file name.
06:09 πŸ”— xmc iiuc they can contain anything url-safe but IA doesn't like you to do that
06:09 πŸ”— JesseW the march 2015 census file is ... somewhat odd.
06:10 πŸ”— JesseW It has 113 duplicate items
06:12 πŸ”— JesseW and only 13,075,195 identifiers, although the list of retrieved identifiers has 14,921,581. I don't know what happened to the missing 1,846,386.
06:13 πŸ”— xmc ... as with all censuses, sometimes there is an undercount :P
06:14 πŸ”— JesseW heh
06:15 πŸ”— JesseW any thoughts about whether I should have three columns: identifier filename hash OR two columns: identifier/filename hash?
06:16 πŸ”— xmc three
06:16 πŸ”— yipdw three, it's uniform
06:16 πŸ”— JesseW ok, that's definitive. :-) three it is.
07:03 πŸ”— JesseW I've now started converting the existing census file into the hash format, as a way of seeing how big it is.
07:03 πŸ”— JesseW (the census file is 21GB)
07:05 πŸ”— JesseW Would it make sense for me to run an updated census from my machine, or would it be better to bug someone with access to FOS to run it there (to minimize network delay)?
07:06 πŸ”— JesseW yipdw, xmc, other people with FOS access...?
07:06 πŸ”— xmc i don't actually have fos access, sorry
07:07 πŸ”— yipdw I don't think fos is a good idea
07:07 πŸ”— yipdw it's often heavily loaded
07:07 πŸ”— yipdw you'd probably do pretty well with a VPS on a fast backbone and 4-8 concurrent connections
07:08 πŸ”— yipdw scale up as needed or until network operations at IA bans the IP
07:08 πŸ”— yipdw (I don't know what that point is, but I don't think 4-8 is close)
07:09 πŸ”— JesseW ha
07:09 πŸ”— JesseW ok, makes sense
07:11 πŸ”— yipdw are you calcuating MD5 sums or fetching them?
07:11 πŸ”— xmc might want to investigate https://blog.archive.org/2013/07/04/how-to-use-the-virtual-machine-for-researchers/
07:11 πŸ”— yipdw if the former how do you distinguish between a change and an error on your end
07:12 πŸ”— yipdw oh neat, I didn't know they had that VM setup
07:13 πŸ”— yipdw OH
07:13 πŸ”— JesseW xmc: yep, I just asked for one.
07:13 πŸ”— yipdw wait that's fos
07:13 πŸ”— xmc cool
07:13 πŸ”— yipdw or rather the same class as fos
07:13 πŸ”— xmc yipdw: i thought it ... yeah it kind of predates fos i guess
07:13 πŸ”— xmc big honkin machine
07:13 πŸ”— yipdw well if it's not the same thing as fos, it's at least in the same subdomain
07:14 πŸ”— JesseW I'm just fetching md5s.
07:14 πŸ”— JesseW I'm not actually doing any verifying that any actual content *matches* the reported hashes -- merely whether what is reported has changed.
07:15 πŸ”— SketchCow FOS is garbage box and the one before it was garbage box too
07:16 πŸ”— yipdw if that's a garbage box I'm going dumpster diving at IA
07:16 πŸ”— xmc ah, you're fetching metadata
07:16 πŸ”— yipdw JesseW is the NSA
07:19 πŸ”— * JesseW has been found out!
07:19 πŸ”— * JesseW hands out dark glasses to everyone
07:21 πŸ”— godane i'm up to 513k items as of today
07:22 πŸ”— godane also kpfa has all 2007 mp3s uploaded now
07:22 πŸ”— godane bbl
07:30 πŸ”— BlueMaxim has quit IRC (Read error: Connection reset by peer)
07:30 πŸ”— BlueMaxim has joined #archiveteam
07:49 πŸ”— kyan has quit IRC (Ping timeout: 633 seconds)
08:03 πŸ”— balrog has quit IRC (Read error: Operation timed out)
08:31 πŸ”— antomatic has joined #archiveteam
08:31 πŸ”— HCross2 has joined #archiveteam
08:31 πŸ”— wutno has joined #archiveteam
08:31 πŸ”— GLaDOS has joined #archiveteam
08:31 πŸ”— _desu___ has joined #archiveteam
08:31 πŸ”— Peetz0r_ has joined #archiveteam
08:31 πŸ”— aliz has joined #archiveteam
08:31 πŸ”— wp494 has joined #archiveteam
08:31 πŸ”— edsu has joined #archiveteam
08:31 πŸ”— hive-mind has joined #archiveteam
08:31 πŸ”— Famicoma1 has joined #archiveteam
08:31 πŸ”— ivan` has joined #archiveteam
08:31 πŸ”— PepsiMax_ has joined #archiveteam
08:31 πŸ”— SilSte has joined #archiveteam
08:31 πŸ”— Kazzy has joined #archiveteam
08:31 πŸ”— efnet.port80.se sets mode: +oooo wp494 edsu ivan` Kazzy
08:31 πŸ”— mistym has joined #archiveteam
08:31 πŸ”— tjg has joined #archiveteam
08:31 πŸ”— zhongfu has joined #archiveteam
08:31 πŸ”— pikhq has joined #archiveteam
08:31 πŸ”— Kenshin has joined #archiveteam
08:31 πŸ”— Rickster has joined #archiveteam
08:31 πŸ”— Atluxity has joined #archiveteam
08:31 πŸ”— Ctrl-S___ has joined #archiveteam
08:31 πŸ”— VonGuard has joined #archiveteam
08:31 πŸ”— zyphlar_ has joined #archiveteam
08:31 πŸ”— beeper has joined #archiveteam
08:31 πŸ”— kevin has joined #archiveteam
08:31 πŸ”— bauruine has joined #archiveteam
08:31 πŸ”— karissa__ has joined #archiveteam
08:31 πŸ”— sigkell has joined #archiveteam
08:31 πŸ”— Gfy has joined #archiveteam
08:31 πŸ”— Fusl has joined #archiveteam
08:31 πŸ”— hades_ has joined #archiveteam
08:31 πŸ”— joepie91 has joined #archiveteam
08:31 πŸ”— SadDM has joined #archiveteam
08:31 πŸ”— efnet.port80.se sets mode: +oooo Kenshin Atluxity joepie91 SadDM
08:31 πŸ”— redlob has joined #archiveteam
08:31 πŸ”— JSharp___ has joined #archiveteam
08:31 πŸ”— Vito` has joined #archiveteam
08:31 πŸ”— deathy has joined #archiveteam
08:31 πŸ”— abartov__ has joined #archiveteam
08:31 πŸ”— johtso has joined #archiveteam
08:34 πŸ”— unstable has joined #archiveteam
08:38 πŸ”— JesseW has quit IRC (Read error: Operation timed out)
08:44 πŸ”— megaminxw has joined #archiveteam
08:50 πŸ”— megaminxw has quit IRC (Quit: Leaving.)
08:51 πŸ”— megaminxw has joined #archiveteam
08:56 πŸ”— BlueMaxim has quit IRC (Read error: Connection reset by peer)
08:57 πŸ”— BlueMaxim has joined #archiveteam
09:02 πŸ”— balrog has joined #archiveteam
09:20 πŸ”— atomotic has joined #archiveteam
10:16 πŸ”— wickedpla has joined #archiveteam
10:17 πŸ”— wp494 has quit IRC (Ping timeout: 260 seconds)
11:06 πŸ”— arkiver3 has joined #archiveteam
11:12 πŸ”— atomotic has quit IRC (Ping timeout: 260 seconds)
11:29 πŸ”— atomotic has joined #archiveteam
11:30 πŸ”— arkiver3 has quit IRC (Ping timeout: 252 seconds)
11:59 πŸ”— BlueMaxim has quit IRC (Quit: Leaving)
12:10 πŸ”— MMovie has quit IRC (Read error: Operation timed out)
12:30 πŸ”— atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
12:48 πŸ”— MMovie has joined #archiveteam
12:53 πŸ”— eightfold has joined #archiveteam
12:53 πŸ”— eightfold hi there
12:53 πŸ”— eightfold any channel for general archive.org talk?
12:56 πŸ”— Atom-- has quit IRC (Read error: Connection reset by peer)
12:57 πŸ”— Atluxity are you looking for search-tips, or what?
12:57 πŸ”— eightfold i wonder how i can search https://archive.org/details/opensource_audio?and[]=subject%3A%22ambient%22&sort=-downloads
12:57 πŸ”— eightfold but only with a certain cc-license
12:58 πŸ”— eightfold (not ND or NC)
13:00 πŸ”— eightfold i also wonder what rules apply when there’s no β€œusage” information, as with this: https://archive.org/details/pertin-nce_053
13:00 πŸ”— eightfold which license applies then?
13:02 πŸ”— VADemon has joined #archiveteam
13:05 πŸ”— Atluxity maybe someone knows
13:06 πŸ”— Atluxity hang on, and maybe you'll see
13:06 πŸ”— Atom-- has joined #archiveteam
13:09 πŸ”— phuzion eightfold: there's always #internetarchive
13:09 πŸ”— eightfold phuzion: thanks!
13:09 πŸ”— phuzion No problem.
13:09 πŸ”— atomotic has joined #archiveteam
13:42 πŸ”— RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue)
13:47 πŸ”— RichardG has joined #archiveteam
13:55 πŸ”— mistym has quit IRC (Remote host closed the connection)
14:03 πŸ”— joepie91 eightfold: http://cryto.net/~joepie91/blog/2013/03/21/licensing-for-beginners/ (this answers your "what if no license is specified" question)
14:05 πŸ”— eightfold joepie91: i guess you are referring to β€œCopyright is a whitelisting system. This means that by default, no one has the right to do anything with what you made, unless you explicitly indicate that they do.”
14:05 πŸ”— eightfold but some of these are listed under β€œcommunity audio”, which is also known as β€œopen source audio”
14:05 πŸ”— nertzy has joined #archiveteam
14:07 πŸ”— joepie91 eightfold: right. but anybody can upload into the community categories (which is most likely precisely why they were renamed to 'community' instead of 'open source')
14:08 πŸ”— WinterFox has quit IRC (Remote host closed the connection)
14:09 πŸ”— eightfold joepie91: ok. so technically archive.org would be fine for hosting all the copyrighted material of an online video broadcasting service?
14:10 πŸ”— joepie91 eightfold: go ahead - if it gets abusemail, it'll get "darked" (which means it is made inaccessible and remains that way for the time being, but still stored in the archive)
14:11 πŸ”— nertzy has quit IRC (Quit: Leaving)
14:12 πŸ”— eightfold joepie91: of course i wont go ahead, i’m not an online broadcasting service. but what you’re saying is that the internet archive is totally fine with storing other peoples copyrighted works?
14:13 πŸ”— joepie91 eightfold: that is not the official position, but you'll find that very few people around these regions care about copyright, and that getting stuff archived is the first priority
14:13 πŸ”— joepie91 you won't get in trouble for doing it anyway.
14:14 πŸ”— joepie91 (personally I consider copyright to be extremely harmful, but that's more of an ethical discussion and less related to the practical aspects of your question, also I'm not the Internet Archive :P)
14:17 πŸ”— Atluxity looks like I am completing gamefront-grab
14:18 πŸ”— HCross Need me to lend a hand?
14:18 πŸ”— HCross Ive got some spare power
14:19 πŸ”— Atluxity me too :)
14:20 πŸ”— Atluxity Kazzy: are you grabbing google-code as "Kaz"?
14:22 πŸ”— nertzy has joined #archiveteam
14:36 πŸ”— megaminxw has quit IRC (Quit: Leaving.)
14:55 πŸ”— joepie91 SketchCow: https://www.youtube.com/channel/UCxfh-2aOR5hZUjxJLQ2CIHw - bunch of videos and livestreams of conference talks, including one that's going on about JS in Svalbard right now
15:02 πŸ”— K4k has joined #archiveteam
15:12 πŸ”— Kazzy Atluxity: yeah
15:12 πŸ”— Kazzy did i break something?
15:21 πŸ”— Amitari has joined #archiveteam
15:23 πŸ”— Amitari On the Geocities article on the wiki, it says "The content is still provided via the same patched torrent above. However, bear in mind Dragan Espenschied has completely redone the thing. Super superior. He spent a year on it.", does anyone know where I can get that improved version?
15:23 πŸ”— Amitari I'm considering getting an extra hard drive just to store and seed Geocities.
15:32 πŸ”— Amitari I've also had a really cool idea for experiencing all these old archived websites.
15:34 πŸ”— Amitari Taking the source code from an early open-source release of Netscape, and updating it a bit to work on more modern OSes.
15:35 πŸ”— Amitari has quit IRC (Leaving)
15:37 πŸ”— DFJustin have you seen http://oldweb.today/
15:39 πŸ”— Atluxity Kazzy: no, I just wondered what your concurrent setting on it is
15:40 πŸ”— Kazzy 10 concurrent
15:40 πŸ”— Kazzy didn't see anything relating to having to limit it so i'm not blocked by google, so just went as it
15:41 πŸ”— atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
15:46 πŸ”— joepie91 PSA: bash.org is gone, our latest copy is from 2015-11-17, but I don't think it's been updated recently anyway
15:49 πŸ”— Atluxity Kazzy: on just 1 host?
15:49 πŸ”— Kazzy yep
15:49 πŸ”— Atluxity wow...
15:49 πŸ”— Kazzy ?
15:50 πŸ”— Atluxity just thought you was so effective at getting items
15:50 πŸ”— Atluxity I run 1 concurrent on my 49 hosts
15:50 πŸ”— phuzion joepie91: RIP bash.org
15:51 πŸ”— joepie91 yep
15:51 πŸ”— joepie91 was a matter of time I suppose
15:51 πŸ”— joepie91 in hindsight
15:51 πŸ”— phuzion Yeah
15:51 πŸ”— Kazzy well, i did away with the whole increasing delay thing, so it doesn't sit idle for like 5 minutes to get a job
15:52 πŸ”— Atluxity aha
15:52 πŸ”— Atluxity thats your trick :D
15:53 πŸ”— Kazzy try to keep that host doing as much as possible, nice to not have it sitting idle generating noise, might as well be working
15:56 πŸ”— antonizoo has quit IRC (Remote host closed the connection!)
16:14 πŸ”— toad2 has joined #archiveteam
16:20 πŸ”— toad1 has quit IRC (Read error: Operation timed out)
16:43 πŸ”— nertzy2 has joined #archiveteam
16:43 πŸ”— nertzy has quit IRC (Read error: Connection reset by peer)
16:51 πŸ”— schbirid2 has quit IRC (Quit: Leaving)
16:59 πŸ”— atomotic has joined #archiveteam
17:07 πŸ”— Emcy has joined #archiveteam
17:14 πŸ”— RichardG has quit IRC (Ping timeout: 258 seconds)
17:25 πŸ”— eightfold has quit IRC (Quit: eightfold)
17:27 πŸ”— zhongfu has quit IRC (Ping timeout: 260 seconds)
17:32 πŸ”— atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
17:43 πŸ”— zino has joined #archiveteam
17:45 πŸ”— K4k has quit IRC (WeeChat 1.3)
17:54 πŸ”— K4k has joined #archiveteam
18:02 πŸ”— VADemon has quit IRC (left4dead)
18:14 πŸ”— schbirid has joined #archiveteam
18:17 πŸ”— JesseW has joined #archiveteam
18:50 πŸ”— dashcloud has quit IRC (Quit: No Ping reply in 210 seconds.)
18:55 πŸ”— dashcloud has joined #archiveteam
18:56 πŸ”— terburg has joined #archiveteam
19:11 πŸ”— RichardG has joined #archiveteam
19:15 πŸ”— K4k has quit IRC (Ping timeout: 250 seconds)
19:18 πŸ”— K4k has joined #archiveteam
19:19 πŸ”— atomotic has joined #archiveteam
19:24 πŸ”— K4k_ has joined #archiveteam
19:25 πŸ”— K4k has quit IRC (Ping timeout: 360 seconds)
19:45 πŸ”— scyther has joined #archiveteam
19:45 πŸ”— scyther has quit IRC (Connection closed)
19:45 πŸ”— nertzy2 has quit IRC (Quit: This computer has gone to sleep)
20:02 πŸ”— JesseW with jake of IA's advice, I'm running an updated IA census
20:02 πŸ”— JesseW should be done in a couple of days. :-)
20:03 πŸ”— ersi What's that? IA census?
20:04 πŸ”— scyther has joined #archiveteam
20:04 πŸ”— JesseW ersi: http://archiveteam.org/index.php?title=Internet_Archive_Census
20:05 πŸ”— ersi Oh, right! Of course
20:05 πŸ”— JesseW list of the sizes, formats and md5s of *all* the (public) files on IA
20:05 πŸ”— JesseW I want to see what's changed since last March. :-)
20:05 πŸ”— JesseW currently, I'm just running a recheck of the existing 14 million itemlist. Jake will get me an updated last later.
20:07 πŸ”— terburg has quit IRC (Quit: terburg)
20:11 πŸ”— Ghost_of_ has joined #archiveteam
20:12 πŸ”— Atluxity you can get access to my iron if it helps you
20:13 πŸ”— JesseW It seems like its bound by IA's limits, so I'm fine.
20:40 πŸ”— JesseW has quit IRC (Ping timeout: 246 seconds)
20:45 πŸ”— atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
21:01 πŸ”— BlueMaxim has joined #archiveteam
21:06 πŸ”— JetBalsa has joined #archiveteam
21:18 πŸ”— terburg has joined #archiveteam
21:49 πŸ”— terburg has quit IRC (terburg)
21:49 πŸ”— scyther has quit IRC (Read error: Connection reset by peer)
22:24 πŸ”— SketchCow CUTE OVERLOAD
22:30 πŸ”— mistym has joined #archiveteam
22:34 πŸ”— SketchCow So, I am going to start making little piles of musicbrainz
22:34 πŸ”— SketchCow Because that thing is past 1.1tb
22:35 πŸ”— nertzy2 has joined #archiveteam
23:12 πŸ”— K4k_ has quit IRC (Ping timeout: 360 seconds)
23:13 πŸ”— nertzy2 has quit IRC (Quit: This computer has gone to sleep)

irclogger-viewer