[00:10] Asparagir: what about rtmpdump? 99% of the streaming content is rtmp [00:10] you could try livestreamer, based on rtmpdump, it could just work out of the box [00:12] Oh wow, livestreamer looks perfect. Thanks! http://livestreamer.tanuki.se/en/latest/ [00:13] Asparagir: yeah, it can work great, but it's luck [00:13] if it doesn't, just use rtmpdump, it's included within the livestreamer release [00:27] youtube-dl works [00:27] iirc I used it to download some stuff [00:28] youtube-dl works for so many things [00:28] it's like a sonic screwdriver for web video [00:29] yep, I think they support several hundreds websites by now [00:29] geez [00:31] Nemo_bis: you can whatch live streams with youtube-dl ? [00:33] the -dl part of it suggests not [00:33] but who knows [00:44] Livestreamer seems to be doing the trick. Really easy to install. Currently grabbing stream from a local Utah station. [00:44] So, thanks! [02:08] is there an archive of shoutcast.com and playlists? [03:45] With the Archivebot how long is it before the upload to IA starts? [03:46] Sum1: they are uploaded in batches [03:46] I don't think there's a fixed interval for that [03:46] afaik it's just "when the server is full" [03:47] Ah, thanks. [03:58] other band that bothers me is Darkside of Innocence, who post unfinished mixes of their tracks on soundcloud for a couple days and then pull them down again [03:58] I love the band [03:58] but not that [03:58] Ouch. That was supposed to be in -bs [03:58] sorry [04:44] https://twitter.com/textfiles/status/414242871261548544 [07:23] https://dnscensus2013.neocities.org/ [08:03] DFJustin: done [08:05] http://www.poynter.org/latest-news/top-stories/234307/singaporean-government-bureaucracy-effectively-closes-news-site/ [12:04] winamp should be closed now... [12:04] but still online [12:04] :D [12:05] archiving everything as much as I can [12:05] everything is probably already downloaded [12:05] but just doing a second one to be sure everything is there 100% [12:14] http://blogs.smithsonianmag.com/science/2013/12/the-vast-majority-of-raw-data-from-old-scientific-studies-may-now-be-missing/ [13:45] joepie91: I have been thinking of going after data for some time, seems like a good idea to start now. I have access to a lot of recources (papers, databases, etc.) via my university [13:46] tephra: I assume those are normally restricted-access documents? [13:47] because you'd probably want to look into pdfparanoia [13:51] joepie91: well yes like most say no automated downloading and restrictions on sharing material especially on papers [13:51] right, definitely pdfparanoia (and general carefulness) then [13:52] (pdfparanoia == watermark stripper) [13:53] nice, downloaded. [14:01] tephra: if you need help with automation of this, let me know :) [14:06] joepie91: thanks! I think automation of paper downloading would be a bad idea since it could get me kicked out of school but I'm making a list of databases that would be good to mirror. Will start tomorrow when [14:49] a gentle reminder: #shipwretched . ~5 days remaining. [14:51] oh, it only needs to get 46 times faster to complete [14:54] everyone!! [14:54] http://www.scirus.com/ [14:54] new website dying: [14:54] one of the biggest scientific search engines [14:54] but [14:55] I think we can't like "download" search engines right? [14:55] arkiver: this might be a good opportunity to do some diplomatic negotiations about them uploading their dataset to IA [14:56] :) [14:56] hmm [14:56] shall I try to mail them? [14:56] oh jesus fuck [14:56] elsevier [14:56] of fucking course [14:56] elsevier? [14:56] arkiver: please do [14:56] I don't think I can type a full e-mail to Elsevier [14:56] without ranting at them [14:56] Elsevier owns that site [14:57] ah I see [14:57] "Elsevier B.V." [14:57] but maybe it would be better to ask if yipdw or sketchcow can send an email [14:57] since I'm not really into this a long time [14:57] perhaps [14:57] so there are some things I don't know yet [14:57] paging SketchCow, paging SketchCow [14:58] and I don't know exactly what the IA can handle and not handle [14:58] what do you mean? [14:58] paging? [15:05] arkiver: just highlighting him and pointing out that it's important :) [15:05] arkiver: http://en.wikipedia.org/wiki/Public_address [15:06] joepie91: ah, I see... :D [15:07] in the old days before cell phones people in for example a hospital would say "Paging Dr. XXX, please call number yyy" over the loudspeaker system [15:07] so no matter where XXX was, he got the message and could call whoever tried to contact him [15:08] Ymgve: ah, haha, I get it now [15:08] thanks [15:18] does someone know if it is possible to download a website for which you need an account? [15:22] arkiver: yes. [15:22] you will need to export your cookies [15:22] and import them into wget/httrack/whatever [15:27] yay, this time I managed to follow CameronD's suggestion, with some changes https://github.com/ArchiveTeam/wretch-grab/pull/2 [15:31] joepie91: hmm, I will take a look if that is possible with heritrix [15:31] arkiver: I feel like you're the one user of heritrix besides IA [15:31] heh [15:31] but would it be legal to download forum with also the account-only parts and then upload it to the archive? [15:33] joepie91: yeah, sometimes I have that feeling too... :P [15:33] but would it be legal to download forum with also the account-only parts and then upload it to the archive? [15:33] don't worry about that [15:33] archive first, ask questions later [15:33] :) [15:34] I don't really get the point in using wget while there is also heritrix which is made to create warc's [15:34] arkiver: wget setup time, 12 seconds (apt-get install wget) [15:34] heritrix setup time... how many days? :P [15:34] ?? [15:34] wiuth setup time you mean get it running? [15:34] yes [15:34] lol [15:34] less then 12 seconds [15:35] if you are running windows (like me): [15:35] create "run.bat" in bin folder [15:35] add "heritrix -a admin:admin" [15:35] in that bat file [15:35] click it [15:35] done! [15:36] that;s it [15:38] and it's working very good [15:39] if internet connection is out [15:39] it pauses the download for 15 minutes [15:39] and then tries again 30 times [15:39] so I don't have to worry for my internet connection [15:39] and it's running way faster i think then other programs [15:39] also [15:39] it unpacks swf files [15:39] and checks them for links [16:31] huh, really [16:31] that's actually kinda cool [16:43] so can anyone explain me what the 4 download/upload speeds in the left bottom of the 8001 localhost page exactly mean? [16:44] bottom speeds = current and top speeds = total? [16:49] green numbers are data downloaded and speed downloaded [16:49] white numbers are data UPLOADED and speed uploaded [16:50] number at the top of the chart is max upload/download speed within that chart [16:50] (which is why it goes up and down as the chart resizes) [16:52] ah thanks! [17:53] hey hey ^.^ [18:17] Hi. [18:19] Hello. Maybe [19:02] Whoever uploaded substantial amounts of USENET to the Internet Archive, thanks! :D [19:03] ZoeB: you should promote your USENET tool here- other folks might like to use it [19:03] Oh, the https://github.com/ZoeB/arcmesg one? [19:03] I think so [19:04] Much like how Git works behind the scenes, you can throw tonnes of messages at it, and it sorts them neatly into a directory with filenames that are SHA1s of the unique message ID [19:05] My dream here is to somehow implement the Message-ID part of the URL spec, so you can link to the message ID and know where you can look it up with reasonable certainty that it'll be there. :) [19:05] But I'm not *that* good a programmer, so I'm certainly open to features, bug fixes, etc [19:07] But you can give it POP3 details, NNTP details, or now a single-message-per-file list of local files (useful in combination with Git, which can split an mbox into one file per message) and it'll import all of the messages from all these places into a big Katamari-style directory [19:09] Anyway, anyone's free to use it. I've got several gigabytes' worth of messages in my personal stash, mostly from USENET, but also a few public mailing lists (mostly GNU, plus a few more obscure ones). The exiting USENET archives on the Internet Archive plus Gmane make it somewhat redundant, but I'm all in favour of redundancy where data preservation is concerned. :) [19:10] It's a fun use of a Raspberry Pi IMO, anyway. :D [19:11] I think someone here created this site: http://olduse.net/ which shows you what you would've seen many years ago today on Usenet [19:12] Yes, that one's neat too! [19:13] I started off just archiving a few synthesiser-based mailing lists over at http://analogue.bytenoise.co.uk but I kinda increased my scope a bit >.> [19:14] Eventually I hope to put this on a website so that all the messages are searchable and browseable, and to put up a torrent of the message tarball, but those are a while off yet. I'm hoping to amass a few TBs' worth of messages first. [19:14] you could use doxbin's source [19:14] iirc they have a search bar thinger [19:15] Doxbin? [19:16] it's exactly what it sounds like [19:16] Heh, OK [19:16] i'd provide a link but it's easy to google and i'd rather not get b& ::P [19:16] *:P [19:17] Searching does get a bit tricky when you get into the several-gigabytes'-worth-of-data territory, yeah. [19:17] Although a nice thing about messages is that they're plaintext, but still, grepping them would take a while [19:17] BIAB, helping Nina install furniture... [19:32] back [19:35] This olduse.net is such a time suck... reminds me of when I was using Pine via a 7" black and white monitor. :) [19:36] downloading rationalwiki [19:36] this is gonna take a while :| [19:37] Ooh, good one [19:46] I just got a few transsex and intersex specific sites (research and support groups), and some obscure operating system sites. Not that I have anything to actually do with these warcs, mind. :) Still, it'll be nice to preserve more obscure parts of what this period in history's like. [19:53] OMG, KA9Q's site is still up! :gets: [19:58] Back when I worked for an ISP, I used to fear getting a tech support call for KA9Q [19:59] ZoeB: glad you like my olduse.net :) [20:00] Ah, it's yours? Yeah, it's really neat! [20:00] ZoeB: not sure if you know about it, but there's archivebot in #archivebot that will automatically archive smaller sites and pages if you have ops (for handling small jobs and making sure things can get archived quickly, in a standard fashion) [20:01] Ooh, I heard about archivebot in a speech of Jason Scott's I was just watching. Neat, thanks! [20:01] What is the standard fashion, incidentally? It'd be nice to use that when I'm grabbing stuff too, I'm guessing [20:01] still looking for usernet archives from the 90's [20:02] closure: So http://article.olduse.net/5946%40unc.UUCP was doing the rounds, and I showed it to my mother, and then we got reminiscing about her old Philips word processor with that kind of screen. So thank you for that. :) [20:02] I'd love to get my hands on the Walnut Creek USENET CD-ROMs [20:02] ZoeB: standard fashion? [20:02] anyone can use it [20:03] * closure too [20:03] yipdw: dashcloud said something about a standard fashion. I assumed he meant something like standard wget arguments, filenames to save to, etc? [20:03] ZoeB: oh [20:03] well, I presume he. I should know better too. :/ [20:03] archivebot automates Archive Team standard practices, i.e. shoving stuff into a gzipped WARC [20:04] Ah, cool [20:04] it has limitations [20:04] I use this personally: "wget -mbc --warc-file=www.example.com --warc-cdx --wait=5 http://www.example.com -o www.example.com.log" But I'm open to suggestions, it'd be nice to conform to what everyone else is doing for consistency's sake, to make other people's lives easier [20:04] here's the github for it: https://github.com/ArchiveTeam/ArchiveBot [20:04] biab, you groovy people [20:05] two big ones: (1) it doesn't yet get Flash videos; (2) Javascript-heavy sites may or may not work, depending on how AJAXed up it is; and (3) there are capacity limits [20:05] and I just realized I put three for two, oh well [20:05] but that said it seems to do pretty well based on what I've seen coming in [20:06] I hear Heritrix has SWF-specific filters; maybe it'd make sense to integrate that [20:08] closure: how much she liked browsing that site: https://twitter.com/thomasbeth/status/407593998195257346 [20:09] So thank you. :) [20:28] Anyone imported the VM image into virtualbox on the commandline? [20:53] hm clanbase shut down. not sure if anyone has a backup [22:24] "Don't look down, never look away; ArchiveBot's like the wind." Nice. :)