#archiveteam-bs 2016-04-10,Sun

↑back Search

Time Nickname Message
00:00 πŸ”— arkhive has left
00:16 πŸ”— joepie91 arkiver: status is Stalled, but it was linked in #revspace oday
00:16 πŸ”— joepie91 today*
00:16 πŸ”— joepie91 so idk
00:51 πŸ”— wyatt8740 still confused - why does archive.org retroactively apply a robots.txt?
00:51 πŸ”— wyatt8740 it just means that a domain parker can ruin more days than they already have
01:01 πŸ”— dan- I presume it probably means less trouble for them, less site owners yelling and causing issues
01:09 πŸ”— wyatt8740 Is there a way to get the content that was blocked retroactively? Not got any site in mind at the moment but it just came up in conversation on another IRC channel
01:09 πŸ”— joepie91 wyatt8740: officially, no.
01:09 πŸ”— wyatt8740 I know archive team often has something stashed away, but there's no public way, is there?
01:09 πŸ”— wyatt8740 ah, okay
01:10 πŸ”— * joepie91 coughs
01:10 πŸ”— * joepie91 coughs very loudly and inconspicuously in the general direction of wyatt8740
01:12 πŸ”— wyatt8740 cover your mouth joepie91
01:12 πŸ”— * joepie91 spreads bacteria
01:37 πŸ”— JesseW wyatt8740: because it's *really hard* to tell, in general, if the current person with access to a domain has no rights to content previously hosted there. And if they *do* have rights, then IA doesn't want to piss them off by making available material they don't want made available.
01:38 πŸ”— wyatt8740 I understand the second use case
01:38 πŸ”— wyatt8740 I tried to find the old runescape website circa 2006 once
01:38 πŸ”— wyatt8740 that was understandable though disappointing
01:40 πŸ”— JesseW which use case do you not understand, then?
01:58 πŸ”— DFJustin the basic thing is IA has a very small staff and following robots.txt is an easy way to get people off their backs without having to spend a lot of time processing requests from people to take down stuff
01:58 πŸ”— Frogging it's unfortunate but I do see the reasoning
01:59 πŸ”— Frogging Could try getting domain parkers to make their robots.txt IA-friendly, but good luck with that
02:00 πŸ”— DFJustin the issue of parked domains comes up from time to time and it would probably be possible to have some better handling of that but we aren't them so there's not much point in asking us
02:00 πŸ”— Frogging although, I think most of the major squatters only break certain, usually irrelevant subdirectories
02:01 πŸ”— JesseW there are also known bugs with the way IA intereprets robots.txt files -- I know of at least one example where the file doesn't disallow access, but the Wayback Machine does.
02:01 πŸ”— JesseW but it's a low priority
02:02 πŸ”— Frogging That doesn't sound like something that should be hard to fix, but I'm not them
02:03 πŸ”— Frogging anyway it's good that third-part WARCs like ours are still available even if the Wayback Machine blocks playback
02:03 πŸ”— Frogging third-party*
02:04 πŸ”— bwn https://archive.org/donate/
02:04 πŸ”— JesseW yep. :-)
02:04 πŸ”— bwn i'm pretty damn impressed by what 170 people can do
02:04 πŸ”— Frogging When I get some money I plan to :p
02:05 πŸ”— JesseW bwn: I did, during the telethon. They sent me cookies. :-)
02:05 πŸ”— Frogging 170? That's it? Neat
02:05 πŸ”— JesseW And a lot of that are the manual work of book scanning.
02:05 πŸ”— JesseW which is important, don't get me wrong
02:06 πŸ”— Frogging for sure
02:07 πŸ”— bwn i can't imagine, i've almost been driven insane making photocopies at work ;)
02:10 πŸ”— Frogging JesseW: heh. Tumblr. Yeah, being owned by Yahoo is worrisome, but it's one of their more successful assets. However, it has no business model
02:10 πŸ”— Frogging at least I don't think it does. I never saw any ads
02:11 πŸ”— JesseW It has quite a lot of ads in the *dashboard* now.
02:11 πŸ”— Frogging hmm
02:11 πŸ”— JesseW If you just read people's blogs, you won't see them, though.
02:11 πŸ”— JesseW It's not a *good* business model.
02:11 πŸ”— Frogging I've "archived" some porn
02:11 πŸ”— Frogging :p
02:13 πŸ”— Frogging I would throw this in ArchiveBot but it looks like it's all on Wayback anyway with all the images http://positivedoodles.tumblr.com/
02:20 πŸ”— JesseW I'm mostly interested in saving various really thoughtful and insightful essays that people have written, which will get obliterated once Tumblr dies.
02:37 πŸ”— bwn has quit IRC (Ping timeout: 492 seconds)
03:37 πŸ”— wyatt8740 For a while my no-ip site was blocked by robots.txt because some other no-ip site blocked the archiver
03:38 πŸ”— wyatt8740 I did scan complete atari service manuals at work once
03:38 πŸ”— wyatt8740 still got an IBM ASCII serial terminal I have to scan the manual for, but it's not in a 3 ring binder so I need to set up a camera + tripod + light style book 'scanner'.
03:39 πŸ”— wyatt8740 (the archive bot has already crawled the atari manuals a number of times)
03:40 πŸ”— wyatt8740 It'd be nice if I could get a job at some place like the IA. But I have no degree and am in Indiana so technology jobs are scarce, especially for people without one.
03:41 πŸ”— wyatt8740 C programming and knowledge of analog electronics don't get you far without that paper
03:41 πŸ”— wyatt8740 :(
03:41 πŸ”— wyatt8740 170 is the number at the internet archive, right?
03:41 πŸ”— wyatt8740 not here
04:06 πŸ”— Sk1d has quit IRC (Ping timeout: 194 seconds)
04:12 πŸ”— Sk1d has joined #archiveteam-bs
04:27 πŸ”— Frogging JesseW: That's a good cause. Fortunately it does look like a lot of it is in the Wayback machine in some form.
04:27 πŸ”— Frogging just from IA crawling
04:40 πŸ”— BlueMaxim has quit IRC (Read error: Operation timed out)
04:48 πŸ”— Stilett0 is now known as Stiletto
04:49 πŸ”— BlueMaxim has joined #archiveteam-bs
04:50 πŸ”— JesseW there is a lot in there, but there is also a lot *not* in there. Tumblr is REALLY BIG
04:51 πŸ”— JesseW Tumbr is big like Wordpress.com or blogger.com are big. :-(
04:52 πŸ”— bwn has joined #archiveteam-bs
06:11 πŸ”— GLaDOS has quit IRC (Read error: Connection reset by peer)
06:15 πŸ”— GLaDOS has joined #archiveteam-bs
06:45 πŸ”— BlueMaxim has quit IRC (Read error: Operation timed out)
06:46 πŸ”— BlueMaxim has joined #archiveteam-bs
06:51 πŸ”— JesseW has quit IRC (Read error: Operation timed out)
06:58 πŸ”— tomwsmf-a has quit IRC (Ping timeout: 259 seconds)
06:58 πŸ”— Jonimus has quit IRC (Read error: Operation timed out)
06:59 πŸ”— Honno has joined #archiveteam-bs
07:01 πŸ”— Honno_ has joined #archiveteam-bs
07:03 πŸ”— Jonimus has joined #archiveteam-bs
07:03 πŸ”— swebb sets mode: +o Jonimus
07:06 πŸ”— metalcamp has joined #archiveteam-bs
07:08 πŸ”— Honno has quit IRC (Read error: Operation timed out)
07:17 πŸ”— BlueMaxim has quit IRC (Read error: Connection reset by peer)
07:18 πŸ”— BlueMaxim has joined #archiveteam-bs
07:46 πŸ”— bwn has quit IRC (Ping timeout: 492 seconds)
08:37 πŸ”— bwn has joined #archiveteam-bs
08:37 πŸ”— Jonimus has quit IRC (Read error: Operation timed out)
08:50 πŸ”— Jonimus has joined #archiveteam-bs
08:50 πŸ”— swebb sets mode: +o Jonimus
09:08 πŸ”— HCross https://www.newscientist.com/round-up/best-of-new-scientist-all-free-until-13-april-2016 O.o
10:00 πŸ”— vitzli has joined #archiveteam-bs
11:14 πŸ”— mksplg has quit IRC (WeeChat 0.4.2)
11:19 πŸ”— vitzli has quit IRC (Leaving)
11:26 πŸ”— davidar HCross: any ideas on how to archive that?
11:27 πŸ”— davidar the "by logging in for free" bit seems like it could be slightly problematic
11:27 πŸ”— HCross Not sure how to get it, but https://www.newscientist.com/search/?s=* has a load of articles
11:28 πŸ”— davidar HCross: yeah, I know someone who's currently crawling those search results to get a list of URLs
12:05 πŸ”— PurpleSym Looking at this site’s source code the first thing you’ll see is: <?php die(); ?>. Not kidding.
12:06 πŸ”— PurpleSym (Only when logged in.)
12:08 πŸ”— BlueMaxim has quit IRC (Quit: Leaving)
12:55 πŸ”— PurpleSym As for downloading the site: `curl -D - -F 'log=erkuiteiae@dontsendmespam.de' -F 'pwd=Scij@swem' -F 'rememberme=forever' 'https://www.newscientist.com/ns-login.php'`, extract newscientist-auth-cookie, wget/wpull/grab-site(?).
13:06 πŸ”— davidar thanks PurpleSym :)
13:07 πŸ”— PurpleSym You can use that account, btw. It’s a throwaway.
13:14 πŸ”— vitzli has joined #archiveteam-bs
13:42 πŸ”— GLaDOS has quit IRC (Quit: Oh crap, I died.)
13:46 πŸ”— GLaDOS has joined #archiveteam-bs
13:53 πŸ”— Honno_ has quit IRC (Read error: Operation timed out)
14:00 πŸ”— RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue)
14:03 πŸ”— RichardG has joined #archiveteam-bs
14:11 πŸ”— lytv has quit IRC (Ping timeout: 633 seconds)
14:17 πŸ”— lytv has joined #archiveteam-bs
14:57 πŸ”— dashcloud This is pretty awesome: it's now possible to accurately display NFOs and other items using DOS-specific fonts natively in the web browser: https://defacto2.wordpress.com/2016/04/05/ascii-nfo-art/
15:09 πŸ”— Honno has joined #archiveteam-bs
15:28 πŸ”— remsen has quit IRC (Read error: Operation timed out)
15:28 πŸ”— remsen has joined #archiveteam-bs
15:35 πŸ”— Jonimus has quit IRC (ircd.choopa.net irc.choopa.net)
15:35 πŸ”— Coderjoe has quit IRC (ircd.choopa.net irc.choopa.net)
15:35 πŸ”— mr-b has quit IRC (ircd.choopa.net irc.choopa.net)
15:35 πŸ”— lbft_ has quit IRC (ircd.choopa.net irc.choopa.net)
15:35 πŸ”— beardicus has quit IRC (ircd.choopa.net irc.choopa.net)
15:35 πŸ”— kvieta has quit IRC (ircd.choopa.net irc.choopa.net)
15:35 πŸ”— Mayonaise has quit IRC (ircd.choopa.net irc.choopa.net)
15:38 πŸ”— lbft has joined #archiveteam-bs
15:42 πŸ”— Jonimus has joined #archiveteam-bs
15:42 πŸ”— mr-b has joined #archiveteam-bs
15:42 πŸ”— beardicus has joined #archiveteam-bs
15:42 πŸ”— kvieta has joined #archiveteam-bs
15:42 πŸ”— Mayonaise has joined #archiveteam-bs
15:42 πŸ”— irc.choopa.net sets mode: +o Jonimus
15:42 πŸ”— swebb sets mode: +o Jonimus
15:44 πŸ”— balrog has quit IRC (Read error: Operation timed out)
15:47 πŸ”— balrog has joined #archiveteam-bs
15:47 πŸ”— swebb sets mode: +o balrog
15:48 πŸ”— dxrt has quit IRC (Ping timeout: 370 seconds)
15:48 πŸ”— dxrt has joined #archiveteam-bs
15:49 πŸ”— Honno_ has joined #archiveteam-bs
15:52 πŸ”— Start has quit IRC (Quit: Disconnected.)
15:55 πŸ”— Start has joined #archiveteam-bs
16:00 πŸ”— Honno has quit IRC (Read error: Operation timed out)
16:00 πŸ”— JesseW has joined #archiveteam-bs
16:13 πŸ”— Mayonaise has quit IRC (Read error: Operation timed out)
16:13 πŸ”— dxrt has quit IRC (Read error: Operation timed out)
16:14 πŸ”— mr-b has quit IRC (Read error: Operation timed out)
16:14 πŸ”— Coderjoe has joined #archiveteam-bs
16:14 πŸ”— mr-b_ has joined #archiveteam-bs
16:14 πŸ”— mr-b_ is now known as mr-b
16:14 πŸ”— dxrt has joined #archiveteam-bs
16:15 πŸ”— kvieta has quit IRC (Read error: Operation timed out)
16:15 πŸ”— beardicus has quit IRC (Read error: Operation timed out)
16:17 πŸ”— JesseW bsmith093: added to the list
16:18 πŸ”— Jonimus has quit IRC (Read error: Operation timed out)
16:20 πŸ”— vitzli JesseW, I have a small hash 'database' with about 1 million files, crc32+md5+sha1+sha2+sha3 :P + whirlpool/aich/ed2k, where available (I changed the format). This was my interest in IA's checksums - so it could be possible to crosscheck/find already archived files
16:21 πŸ”— JesseW nice!
16:23 πŸ”— vitzli and it was the reason behind visualization part, but I've not touched it yet
16:26 πŸ”— vitzli my main interest is mapping between md5/sha1 and sha256, everything else is future-proofing, hope to do ipfs hashes too
16:29 πŸ”— JesseW IPFS hashes aren't unique, sadly.
16:30 πŸ”— JesseW They are hashes of a particular *representation* of a bitstream, not of the unique bitstream itself.
16:30 πŸ”— JesseW So they aren't so useful for reference.
16:30 πŸ”— vitzli yes, I hope to get around with a proper database
16:31 πŸ”— JesseW Similar to BitTorrent magnet links, which are a hash of a particular torrent's info block, which can be arbitrarily modified while still referring to the same actual files.
16:31 πŸ”— vitzli ipfs hashes for files are mostly ok, except they could refer to different hashing algorithms
16:32 πŸ”— vitzli 'database' is currently looks like: https://archive.org/download/isohunt.moonshine.2016/isohunt.moonshine.2016.rhash.txt
16:34 πŸ”— JesseW no, the problem with ipfs hashes is that it's hashing the particular structure of sub-blocks, which can freely change arbitrarily at any time (for efficiency or other reasons)
16:34 πŸ”— vitzli hashes are going to be in one table, filenames go into another, and ipfs hashes go into hash->ipfs mapping table
16:34 πŸ”— JesseW (as I understand it)
16:35 πŸ”— vitzli I thought hashes for files/large objects are stable
16:35 πŸ”— vitzli until the file itself has not been changed
16:35 πŸ”— JesseW not as I understand it
16:36 πŸ”— JesseW but I don't understand IPFS very well at all (yet)
16:36 πŸ”— vitzli I don't either :)
16:41 πŸ”— schbirid has joined #archiveteam-bs
16:42 πŸ”— kvieta has joined #archiveteam-bs
16:42 πŸ”— Jonimus has joined #archiveteam-bs
16:42 πŸ”— swebb sets mode: +o Jonimus
17:02 πŸ”— Mayonaise has joined #archiveteam-bs
17:18 πŸ”— beardicus has joined #archiveteam-bs
17:19 πŸ”— vitzli has quit IRC (Leaving)
17:37 πŸ”— Jonimus has quit IRC (Ping timeout: 633 seconds)
17:40 πŸ”— Honno_ has quit IRC (Read error: Operation timed out)
17:40 πŸ”— Mayonaise has quit IRC (Ping timeout: 633 seconds)
17:41 πŸ”— Mayonaise has joined #archiveteam-bs
17:43 πŸ”— Honno has joined #archiveteam-bs
17:46 πŸ”— Sanky has joined #archiveteam-bs
17:49 πŸ”— kvieta has quit IRC (Ping timeout: 633 seconds)
17:52 πŸ”— JesseW TIL that IA received (in 2014) a copy of the BBC's website as of 1995 -- and it's available through the Wayback Machine: https://web.archive.org/web/19950301190227/http://www.bbcnc.org.uk/bbctv/bbctv.html
17:53 πŸ”— kvieta has joined #archiveteam-bs
17:56 πŸ”— JesseW and the WARC is available: https://archive.org/details/bbcnc.org.uk-19950301
18:00 πŸ”— Mayonaise has quit IRC (Ping timeout: 633 seconds)
18:00 πŸ”— JesseW also, there's a typo in the description for some of the early crawls: https://archive.org/details/IA-001140-c -- this is not from 19**56**. :-)
18:20 πŸ”— kvieta has quit IRC (Read error: Operation timed out)
18:22 πŸ”— beardicus has quit IRC (Read error: Operation timed out)
18:46 πŸ”— kvieta has joined #archiveteam-bs
18:46 πŸ”— dashcloud has quit IRC (Read error: Connection reset by peer)
18:49 πŸ”— dashcloud has joined #archiveteam-bs
18:55 πŸ”— Mayonaise has joined #archiveteam-bs
19:04 πŸ”— beardicus has joined #archiveteam-bs
19:06 πŸ”— BlueMaxim has joined #archiveteam-bs
19:17 πŸ”— Jonimus has joined #archiveteam-bs
19:17 πŸ”— swebb sets mode: +o Jonimus
19:19 πŸ”— JesseW has quit IRC (Ping timeout: 370 seconds)
19:24 πŸ”— ohhdemgir https://imguwut.com/trump.mp4
19:25 πŸ”— schbirid has quit IRC (Quit: Leaving)
19:25 πŸ”— bsmith093 SketchCow yipdw_ swebb DFJustin godane anybody have a myspleen archive? i've heard there were toonheads episodes in there somewhere.
19:26 πŸ”— bsmith093 animation history is cool as hell, and it does not deserve to rot behind private trackkers and on old nhs taper
19:26 πŸ”— bsmith093 *vhs tapes
19:33 πŸ”— Stiletto is now known as Stilett0
19:35 πŸ”— atrocity what is toonheads?
19:35 πŸ”— godane ToonHeads was an animated showcase of Metro-Goldwyn-Mayer & Warner Bros. cartoon shorts, prominently by animators and voice actors like: Mel Blanc, Tex Avery, Hugh Harman, Rudy Ising, David H. DePatie, Friz Freleng, Chuck Jones, William Hanna, Joseph Barbera, and Daws Butler uncut.
19:36 πŸ”— godane there is a incomplete set on myspleen
19:36 πŸ”— godane 49 episodes of it i think
19:36 πŸ”— atrocity and there has been no release anywhere publicly (like dvd/blu-ray/etc) since it aired?
19:37 πŸ”— dashcloud has quit IRC (Read error: Operation timed out)
19:40 πŸ”— Stilett0 has quit IRC (Ping timeout: 260 seconds)
19:43 πŸ”— bsmith093 godane: holy crap it's been years and it's still there?! grab it please?
19:44 πŸ”— bsmith093 atrocity: none whatsoever, and there probably never will be, except for the single partial episode on the looneytunes golden collention
19:45 πŸ”— atrocity hmm
19:50 πŸ”— atrocity why was there only a partial episode on that? lol
19:50 πŸ”— bsmith093 atrocity godane i have one episode i found on mediafire archive.org/details/ToonheadsArchive
19:50 πŸ”— bsmith093 atrocity: because wb sucks!
19:51 πŸ”— dashcloud has joined #archiveteam-bs
19:52 πŸ”— atrocity lol
19:55 πŸ”— bsmith093 godane: ping, can you get it off myspleen?
19:59 πŸ”— godane yes
20:02 πŸ”— bsmith093 godane: thanks! fos, please?
20:02 πŸ”— atrocity fos?
20:03 πŸ”— godane ok
20:03 πŸ”— godane FOS is a server we use for projects
20:04 πŸ”— atrocity ahh
20:04 πŸ”— atrocity bsmith093: what made ou think of toonheads and myspleen? lol
20:06 πŸ”— bsmith093 atrocity: i've been looking for that show for years, on and off. i found a thread on a 4chan archive from 2011 that said a bit of it was on myspleen, but that does me no good, as myslpeens been closed to invites for a while now
20:07 πŸ”— atrocity ahh, ok. just random, hah
20:08 πŸ”— bsmith093 godane: there's actually already a myspleen folder on fos; that's what made me think of asking here.
20:20 πŸ”— bwn has quit IRC (Ping timeout: 246 seconds)
20:42 πŸ”— JesseW has joined #archiveteam-bs
20:43 πŸ”— Stiletto has joined #archiveteam-bs
20:51 πŸ”— tomwsmf-a has joined #archiveteam-bs
20:55 πŸ”— bwn has joined #archiveteam-bs
21:05 πŸ”— Honno has quit IRC (Read error: Operation timed out)
21:17 πŸ”— SketchCow Back home tonight, then a week of solid work.
21:23 πŸ”— JesseW Welcome home (soon)
21:27 πŸ”— JesseW SketchCow: BTW, could you give me a link to a real example of a non-indexed item on IA? I have been told they exist, but haven't seen an actual one.
21:42 πŸ”— ErkDog_ has joined #archiveteam-bs
21:44 πŸ”— ErkDog has quit IRC (Read error: Operation timed out)
21:44 πŸ”— ErkDog_ is now known as ErkDog
21:47 πŸ”— atrocity ugh, i hate having to resize partitions
21:57 πŸ”— dashcloud has quit IRC (Read error: Operation timed out)
22:01 πŸ”— dashcloud has joined #archiveteam-bs
22:33 πŸ”— ohhdemgir ditto
22:34 πŸ”— SketchCow Ha ha no.
22:34 πŸ”— SketchCow But yes, they exist.
22:34 πŸ”— SketchCow Not the most effective thing
22:34 πŸ”— SketchCow People just keep giving away the links to guys in IRC channels
22:39 πŸ”— JesseW Well, I presumed we wouldn't mention the actual URL in the public channel. And I certainly wouldn't mention it anywhere in public. I am just interested in seeing one to check if they operate similarly to the various other categories of items I have already seen (specifically, fully-public items, stream-only items, items with private files, darked items, deleted items and unused identifiers).
22:39 πŸ”— JesseW As part of my IA census efforts.
22:41 πŸ”— JesseW SketchCow: Admittedly, I'm also curious about what the general type(s) of things are considered safe to make public, but not safe enough to make indexed by the internal search engine. But I certainly don't care about any specific ones (not knowing what they are).
22:50 πŸ”— pikhq has quit IRC (Ping timeout: 506 seconds)
22:50 πŸ”— i0npulse has quit IRC (Ping timeout: 506 seconds)
22:52 πŸ”— pikhq has joined #archiveteam-bs
22:54 πŸ”— i0npulse has joined #archiveteam-bs
23:26 πŸ”— metalcamp has quit IRC (Ping timeout: 244 seconds)
23:30 πŸ”— Atros has joined #archiveteam-bs
23:31 πŸ”— atrocity has quit IRC (Ping timeout: 244 seconds)
23:32 πŸ”— midas has quit IRC (Ping timeout: 244 seconds)
23:34 πŸ”— RichardG has quit IRC (Ping timeout: 244 seconds)
23:34 πŸ”— midas has joined #archiveteam-bs
23:34 πŸ”— RichardG has joined #archiveteam-bs
23:48 πŸ”— espes__ has quit IRC (Ping timeout: 244 seconds)
23:54 πŸ”— espes__ has joined #archiveteam-bs
23:56 πŸ”— SimpBrain has quit IRC (Ping timeout: 244 seconds)
23:56 πŸ”— SimpBrain has joined #archiveteam-bs

irclogger-viewer