[01:29] uhhh... [01:29] hey SketchCow, why is https://archive.org/details/archiveteam-warrior darked? [01:31] this is very strange [01:31] there's no dark job or anything [01:31] * joepie91 blinks [01:31] wait, why can I see the history in the first place? [01:32] history view just needs an account I think [01:33] ah [01:33] still, why is it darked [01:33] or well, at least I think it's darked [01:33] someone messaged me that the download didn't work... [01:41] joepie91: the last change to log was 215 days ago [01:41] yes, that is what I noticed [01:42] normally some event will say its dark in the history [06:25] anyone know what bittorrent tracker archive.org is using? [06:32] deathy: its own [06:35] ah..found in the FAQ "We are using opentracker, which has proven to be highly scalable." [06:38] This might be dangerous? "Starting in 2011, the Internet Archive began automatically retrieving BitTorrent files uploaded into most Community collections." [06:39] naah [06:39] more like 'awesome' [06:40] I began crawling some bittorrent dumps a few weeks ago. Have ~6 Million .torrent files lying around.. [06:40] zip & upload [06:42] I put this up the other day which is 873,671 old torrents from the pirate bay https://archive.org/details/TPB_index_20090815 [06:42] helpfully downloaded from another torrent via the aforementioned automatic feature [06:56] I just have them like "INFOHASH.torrent" files, with a few folders based on filename.(just the torrents, no meta) That +zipped acceptable upload format? .. [07:01] How do you guys view warc files? [07:04] deathy: what is the use of old torrent fdiles? [07:08] just found the chat logs when i told you guys about dl.tv and crankygeeks not being on archive.org: http://badcheese.com/~steve/atlogs/?chan=archiveteam&day=2011-09-27 [07:08] odie5533_: in my case, possible research. hopefully doing my master's thesis bittorrent-related. For anyone else...I don't know. Backup everything? [07:09] deathy: care to elaborate on your plans for them? [07:11] odie5533_: initially with these, getting whatever stats I can. Long term..not completely decided yet.. [07:15] odie5533_: re: viewing: https://github.com/ArchiveTeam/warc-proxy [07:15] yipdw: is that the only thing you use? [07:26] Are there any other programs for viewing warcs, or does everyone use warc-proxy? [07:27] odie5533_: well, there's wayback [07:27] but warc-proxy requires a lot less supporting software [07:27] Does everyone here use warc-proxy? [07:33] not *all* of the 146 lurkers. [07:33] but anyone that views warc files uses it I guess. [07:33] * phillipsj shrug [07:34] Do most people on archiveteam view warc files, or just upload them and not look at the contents? [07:34] i just run the warrior [07:34] fuck the rest of that noise [07:37] aMunster: how can you be sure you're doing anything? [07:37] actually, i run the scripts outside of warrior [07:37] i check in on them every few days, and then lurk [07:38] aMunster: how do you run them outside of the warrior? [07:38] http://archiveteam.org/index.php?title=Puu.sh [07:38] project source link: https://github.com/ArchiveTeam/puush-grab [07:39] then, the tracker such as http://chfoo-d1.mooo.com:8031/puush/#show-all shows highscores [07:39] ooh high scores! [07:40] woah that one dude has over 1 TB! [07:40] how many IPs do you have at your disposal? [07:40] I have a VPN [07:40] so maybe a few dozen [07:41] you'll need more than that to beat those scores [07:41] But I have like no bandwidth, so it barely matters. [07:41] don't even have enough bandwidth to download things I want for myself, let alone for archiveteam =/ [07:41] college, eh [07:41] no. [07:42] no excuse for bandwidth problems then [07:42] :( [07:42] heh [07:43] What is happening to puu.sh? [07:43] one month link expiration [07:43] but they said they were offering "permanent storage" [07:43] how can "permanent storage" end? [07:43] terms of service change often [07:44] How can I look at some of the puu.sh files that have been saved? [07:45] https://archive.org/details/archiveteam_puush [07:45] those are like 10 GB. [07:46] they're split into those chunks, yes [07:46] I just wanted to look at a few small files. [07:47] What type of files are they? All images? [07:47] anything's that on puush [07:47] videos too [07:47] Can you give me an example? I can't seem to find a "search" button the puush site [07:48] http://puu.sh/4gMPk [07:48] I see. [07:50] we don't even know who these images belong to right? [07:51] that isn't part of the scope i think [07:52] I looked through a few images and didn't find any I'd want to save. [07:53] SketchCow: can please start giving me full access to may collections [07:54] *my collections [07:54] i just took 20 mins uploading a episode of diggnation to my diggnationseries just figure out i don't have access to it [07:56] based on what i can tell i could move stuff from godaneinbox once i have access to a collection to move it to [08:00] So, is there a way to easily access individual files from those giant archives? [08:01] or is the only way to download the entire 10 GB chunks and play them with e.g. the warc-proxy [08:01] that's the only way [08:28] No, that's not the only way. [08:28] You could parse the CDX file and get the byte offsets and get only the parts you want. It's a bit more hassle though. [08:29] ersi: hmm, interesting idea. [08:29] Does archive.org support byte-by-byte download? [08:31] Also, what is the difference between archiveteam_puush_20131025041147.cdx.gz and puush_20131025041147.megawarc.warc.gz ? [08:31] It's called "byterange" download and yes. [08:32] cdx is a index [08:32] sorry, I pasted the wrong bit [08:32] there is both an Item CDX Index and a WARC CDX Index [08:34] they have the same contents. [08:34] I'm not sure. But maybe the "WARC CDX index" is generated when the items is uploaded and the "item CDX index" is what we've generated and uploaded [08:35] right, and they should be identical, though the gzip setting makes a difference in .gz size [11:57] deathy: if you want some bigger test cases, I'm uploading them :) https://archive.org/details/ftp-ftp.hp.com_pub-2013-10 [11:57] $ lrztar -l softlib/software10 [11:57] Compression Ratio: 1.186. Average Compression Speed: 4.864MB/s. [11:57] Total time: 19:32:02.90 [11:57] directory down to 282 GB, not super-impressing but better than nothing [11:59] WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD [13:09] "yahoosucks" :) [13:17] he left ages ago [13:17] we suck at customer service [13:17] 0/10 would never do it again [13:19] heh, yeah - I got join/parts off [13:20] I still like saying yahoosucks though [13:20] but yahoo does suck! [14:13] odie5533_: the item cdx index is an index of all the warc files on the item whereas the warc cdx index is associated with one warc file [14:13] if the item has only one warc they would be the same [14:14] ah. that makes sense [14:14] for viewing results there is also http://www.archiveteam.org/index.php?title=The_WARC_Ecosystem#warc_to_zip [14:15] yeah, I saw that. didn't seem as useful. thanks though [14:15] or once it has been indexed into the wayback machine you can do a search like http://web.archive.org/web/*/http://puu.sh/* and compare the capture dates with the warc date [14:16] I had uploaded a warc a while back, but it never showed up in the wayback machine. [14:16] what's the item [14:17] aww [14:17] your link crashed my firefox [14:17] yeah that works better for sites that are less omghuge [14:18] https://archive.org/details/journals.math.tku.edu.tw_2013-01-06_mirror [14:19] it needs to be in one of the web collections with mediatype=web in order to be added to wayback [14:19] and not have a restrictive robots.txt [14:19] robots looks ok in this case [14:19] doesn't seem to let me change the mediatype [14:19] an admin needs to do it [14:20] I don't know any admins =/ [14:20] Yeah, just saying that ti's a critera. [14:21] jscott@archive.org [14:21] (^ That's SketchCow's Internet Archive e-mail) [14:21] but SketchCow is right here. [14:22] SketchCow is very busy. Put it on the pile by sending an e-mail. :-) [14:22] alright [18:28] Good morning [18:29] Fixed. [18:36] SketchCow: awesome; what was the issue? [18:50] odie5533 can't upload web. [18:52] hello world [18:52] anyone have a good source of images from 5.25 floppys? [18:52] games/apps, idk [18:52] perferable stuff that will run on like dos or win3.1 [19:03] WiK: you could look at the older Twilight CDs [19:04] afaik a bunch of stuff on there came from floppy-distributed things [19:04] godane has been uploading them to IA [19:05] in terms of images there's a bunch in the ibm5150 section of https://archive.org/download/MESS_0.149_Software_List_ROMs/MESS_0.149_Software_List_ROMs.zip/ [19:05] most of it is booters and not dos/windows based though [19:09] There are some 20GB+ dos game packs that show up on nzb search sites. Probably torrents too but they might have died :( [19:10] a bunch of those are on ia now [19:10] https://archive.org/search.php?query=collection%3Avintagesoftware%20dos [19:11] DFJustin: perfect, thanks [19:11] and of course https://archive.org/details/classicpcgames [19:13] now, to see if i can get my kyroflux to write these to 5.25 floppies [19:16] Wik I have access to some physical 360k floppies :) [19:16] dumps plz [19:17] phillipsj: ive got 400 5.25 PC/AT floppies, brand new [19:18] Not sure about the copyright status of all the software. Much o fit is free/trialware though (from a BBS) [19:18] no one really cares with stuff that old [19:19] seems i need to make an IPF or somekind of image file from the dos files, and then i can push via kyroflux [19:20] what os [19:21] win 7, but ive access to cygwin/linux [19:21] http://www.winimage.com/ [19:22] I dunno if kryoflux supports dos sector images for writing though, it didn't last I checked but it's been a while [19:22] this is getting to be -bs btw [19:22] DFJustin, if that is the case, why doe copyright ast so long? [19:22] DFJustin: hoping it does [19:23] people care about stuff that still makes gazillions of dollars like mickey mouse [19:23] cga games, not so much [19:24] I will note that http://www.gog.com/ is awesome and if you want to pay for some old games that is a way to go [19:25] I can second gog being awesome [19:26] but there's hardly anything from the 360k era still available that way [19:27] DFJustin, gog wraps them in icky windows binaries :P [21:04] Anyone here live in Colorado besides swebb and myself? [21:07] Also, I've brought up the whole unsold TV pilots thing before. I was thinking about it recently and was wondering if a Kickstarter/IndieGoGo would work to purchase those lost television pilots and make publicly available for free. [21:09] I don't know much about KickStarter/IndieGoGo so I'm not sure if that kind of stuff would work. I know someone here said the pilots are locked away in the studio's 'vault' and never seen again.. But could we raise enough money to buy them from the NBC's and CBS's? [21:09] Again I have little to no idea how studios do business and crowd funding... Just me thinking out loud. [21:31] With more work, I may be able to copy C64 160k floppies (drives are notorious for going out of alignment) [21:34] ah yes, hand aligning 1541s with just an oscope was good for $30 a pop back in the day