[00:00] nginx is great at static :) [00:00] i've been thinking about this for all of fifteen minutes [00:00] heh [00:00] but yeah, banner should be easy [00:00] I think there's even an nginx module for that [00:00] i'm not sure if i'm following [00:00] that can insert HTML on all text/html requests [00:00] so, you see this page http://web.archive.org/web/20060823114645/http://people.vislab.usyd.edu.au/~ssmith/ [00:00] it's a directory index with a banner at the top [00:00] Swap e-mails and discuss it. [00:00] rolfb: i'm duncan@xrtc.net, send me a mail [00:01] * xmc -> work [00:01] ok, will do [00:01] * joepie91_ is admin@cryto.net but you probably won't need me [00:01] <3 [00:06] #archiveteam needs a piano player in the corner [00:06] https://www.youtube.com/watch?v=PNA7DcVppEs [00:06] who doesn't? [00:07] *** mistym has quit IRC (Remote host closed the connection) [00:08] awesome [00:08] One of my favorite songs. [00:08] I know we're in -bs now [00:11] *** Nertsy has quit IRC (Ping timeout: 512 seconds) [00:14] joepie91_, SketchCow: thank you for your time [00:15] *** Nertsy has joined #archiveteam [00:16] rolfb: thank *you* :) [00:17] rolfb: archive team stuff is generally always read-only (except for dmca takedown requests) [00:18] (or correction of errors in archival) [00:18] oh, xmc would actually be running the site? that's pretty cool I think :) [00:18] I've never seen this sort of collaboration :) [00:18] rolfb, kudos for making it happen [00:19] balrog: i really don't feel that i deserve any thanks in this matter [00:26] *** Ymgve has quit IRC () [00:33] I am the least pleasant of all the members, so if you can handle me, you can handle anything. [00:35] *** cbb2 has joined #archiveteam [00:37] *** X-Scale has quit IRC (Ping timeout: 240 seconds) [00:37] chatter on twitter that gigaom.com is closing [00:37] oh no [00:37] is that too big for archivebot? [00:37] Nothing's too big for archivebot [00:37] Archivebot makes ISPs tremble. Amazon still has scars on its face from archivebot [00:38] :) [00:38] *** cbb has quit IRC (Read error: Operation timed out) [00:40] *** dashcloud has quit IRC (Read error: Operation timed out) [00:41] *** X-Scale has joined #archiveteam [00:43] *** dashcloud has joined #archiveteam [00:47] Atluxity, SketchCow, it's coming from @methewi who's a senior writer and @gigastacey [00:47] Mike Masnick already asked if we're doing something [00:47] https://twitter.com/mmasnick/status/575094502223974400 [00:47] I queued it on archivebot [00:48] oh nice it has sitemaps [00:49] can archivebot archive archivebot? [00:49] Yes [00:51] whether it's a good idea is a separate topic [00:52] http://www.pbfcomics.com/archive_b/PBF115-Hug_Bot.jpg [00:52] rolfb: also, thank /you/ indeed - arranging things is important, and it's nice to have somebody who can actually get things done :) [00:52] * joepie91_ wonders if he can be second-least-pleasant in line behind SketchCow [00:53] joepie91_: btw, did sircmpwn ever follow through for mediacrush? [00:53] balrog: I have no idea tbh [00:53] yep [00:54] joepie91_: discovering a new possibility was great [00:54] It's all on archive.org, and Imgrush has a copy too. [00:54] garyrh: awesome [00:54] copies of copies of copies of.. :D [00:54] the archivist in me is satisfied [00:54] copies all the way down. [00:55] now for a brief off-topic remark now that everybody is awake here (and any discussion about it should go into -bs): [00:55] if you haven't seen https://www.factorio.com/ yet, you should do so now [00:55] watch trailer etc [00:55] rolfb: there's a reason the wiki "password" is "yahoosucks" :) [00:56] (though for some reason sites that deliberately block us *really* get on my nerves. like twitpic iirc) [00:56] balrog: what's the story? [00:56] rolfb: with yahoo? http://archiveteam.org/index.php?title=Yahoo covers it [00:56] rolfb: they are the arch nemesis of AT [00:57] they just shut things down left and right. [00:58] nemesis sounds about right [00:58] could someone run youtube-dl --title --continue --retries 4 --write-info-json --write-description --write-thumbnail --write-annotations --all-subs --ignore-errors -f bestvideo+bestaudio/best 'https://www.youtube.com/user/gigaom/videos' [00:58] they have 2585 videos [00:58] garyrh: not /best [00:58] just bestvideo+bestaudio [00:58] still have to give Yahoo credit though, for being basically the only still operational large internet company with actual, real-world-existing and functioning customer service [00:59] that /best needs to be taken out of the instructions since it's no longer recommended [00:59] but that's all the credit they're going to get from me :) [00:59] I'll remove it from the wiki then. [00:59] garyrh: any way to parallelize that? [00:59] dunno [00:59] bestvideo+bestaudio does require ffmpeg for muxing [00:59] we need to set up a pipe directly between yahoo's datacenter and IA [00:59] garyrh: I'll run youtube2internetarchive on it [00:59] HOWEVER, google no longer provides good muxed video files [01:00] so just best is not recommended [01:00] *** dashcloud has quit IRC (Read error: Operation timed out) [01:00] groupadd: /etc/group.192592: No space left on device [01:00] uh... [01:00] >.> [01:00] oops? [01:00] i'm off, thanks for helping out and finding solutions [01:00] gn [01:01] night rolfb :) [01:01] thanks for coming! night [01:01] *** rolfb has quit IRC (Linkinus - http://linkinus.com) [01:02] *** cbb2 is now known as cbb [01:03] *** mistym has joined #archiveteam [01:03] *** dashcloud has joined #archiveteam [01:05] garyrh: will ytdl in a moment once my move has completed [01:05] stupid OVH partitioning [01:06] garyrh: running [01:06] * mhazinsk is getting youtube IDs now... 342/2585 [01:07] *** primus104 has quit IRC (Leaving.) [01:07] 1300.. [01:07] 1600.. [01:07] mhazinsk: what are you running it on, a potato? :D [01:07] ah boo [01:08] ERROR: Unable to download webpage: HTTP Error 429: Too Many Requests (caused by HTTPError()); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output. [01:08] youtube hates me now :( [01:08] laptop I'm doing youtube-dl --get-id https://www.youtube.com/user/gigaom/videos on my laptop, is there a better way? [01:08] ah, if residential connection, I can understand [01:08] I'm running it from a server [01:09] all my colo boxes are running EL which doesn't work well with ffmpeg :( [01:09] how do I make ytdl delay? [01:09] mhazinsk: shouldn't need ffmpeg [01:09] wait, EL? [01:09] bestvideo+bestaudio muxes with ffmpeg [01:09] oh? missed that [01:09] EL = enterprise linux, e.g. RHEL or CentOS [01:11] ah [01:12] hmm [01:12] looks like all of OVH IPv6 is blocked [01:12] from youtube? [01:12] yes [01:13] trying with -4 parameter now [01:13] yep, that works [01:17] http://om.co/2015/03/09/a-statement-about-gigaom/ :( [01:32] *** SadDM_ has joined #archiveteam [01:32] *** swebb sets mode: +o SadDM_ [01:32] *** SadDM has quit IRC (Read error: Connection reset by peer) [01:37] I'm downloading Gigaom podcasts from Soundcloud. [01:39] *** dashcloud has quit IRC (Read error: Operation timed out) [01:43] *** dashcloud has joined #archiveteam [01:43] running youtube2internetarchive now https://archive.org/search.php?query=creator:%22gigaom%22 [01:55] *** cbb has quit IRC (Quit: cbb) [02:02] *** abartov has quit IRC (Ping timeout: 258 seconds) [02:25] *** nertzy has quit IRC (Read error: Connection reset by peer) [02:25] *** nertzy has joined #archiveteam [02:28] Can someon please add a link for the public log of this channel into the channel topic [03:08] *** ohhdemgir has quit IRC (Read error: Operation timed out) [03:11] *** ohhdemgir has joined #archiveteam [03:24] http://blog.friendfeed.com/2015/03/dear-friendfeed-community-were.html [03:24] *** BlueMaxim has joined #archiveteam [03:31] how does #friendfood sound for an irc channel? [03:36] fiendfeed? [03:37] endfeed? [03:39] humancentifeed? [03:46] Let's give it to BlueMaxim [03:46] After 5 years or something [03:46] #humancentifeed] [04:02] *** kyan has joined #archiveteam [04:05] So, I assume google-business-sitebuilder is still going along. [04:06] But what is ovi-store vs ovi-store-attempt_2 [04:06] And cobook [04:09] ovi-store has half of its pages as 403 so it was completely redone to ovi-store-attempt_2. ovi-store-attempt_2 is done. [04:09] *** SadDM_ has quit IRC (Ping timeout: 370 seconds) [04:09] *** dashcloud has quit IRC (Read error: Operation timed out) [04:12] So I should remove ovi-store and rename ovi-store-attempt_2 to ove-store [04:13] *** dashcloud has joined #archiveteam [04:16] yes, ovi-store can be removed and ovi-store-attempt_2 renamed to ovi-store [04:16] *** wp494 has quit IRC (Ping timeout: 740 seconds) [04:21] *** SadDM has joined #archiveteam [04:21] *** swebb sets mode: +o SadDM [04:27] *** SadDM has quit IRC (Ping timeout: 370 seconds) [04:28] It's being shoved into a single item [04:28] Probably not wise, but oh well [04:29] I might put it into a set of items. [04:29] Also: https://archive.org/details/stage6-1996418 [04:29] We save the best [04:32] *** SadDM has joined #archiveteam [04:32] *** swebb sets mode: +o SadDM [04:32] *** mistym has quit IRC (Remote host closed the connection) [04:32] *** techapj_ has joined #archiveteam [04:36] *** mistym has joined #archiveteam [04:40] Yeah, probably going to split ovi-store into 10 items just to prevent meltdown [04:43] *** mistym has quit IRC (Read error: Operation timed out) [04:47] Richard Dawkins dead. [04:47] Archivebot away [04:48] SketchCow: you sure? [04:49] I smell trolls [04:49] *** acridAxid has quit IRC (Quit: Quitting) [04:57] Fuck it, do it anyway [04:59] *** dashcloud has quit IRC (Read error: Operation timed out) [05:02] *** dashcloud has joined #archiveteam [05:03] *** acridAxid has joined #archiveteam [05:03] *** wp494 has joined #archiveteam [05:04] *** mistym has joined #archiveteam [05:19] SketchCow: cobook was an endangered contact book website. it's done and can be uploaded to ia. [05:40] *** dashcloud has quit IRC (Ping timeout: 240 seconds) [05:40] *** dashcloud has joined #archiveteam [05:54] Thanks [06:04] *** BlueMaxim has quit IRC (Ping timeout: 370 seconds) [06:06] *** BlueMaxim has joined #archiveteam [06:29] archiveteam_archivebot_go_20150310030001: uploading www.walmart.horse-inf-20150309-231028-5u1or-meta.warc.gz: [################################] 1/1 - 00:00:00 [06:29] ha ha, fucking archiveteam [06:38] archivebot status board [06:38] you know, actually, hmm [06:44] lol [07:02] *** mistym has quit IRC (Remote host closed the connection) [07:06] make sure there's an entry in the status board for the status board [07:20] that doesn't make sense but ok [07:21] provide status of the board, giving the same info as for projects [07:22] mostly suggested as a joke [07:22] lol [07:22] Online: Yes [07:23] So if the board is having problems you can see them on the board [07:23] uhm [07:23] ok [07:25] the idea was to show the latest item coming in [07:25] but now that is #-bs material [07:25] yes [07:43] heh, yeah someone setup walmart.horse and walmart was piiiissssed [07:48] *** nico_32 has quit IRC (Ping timeout: 240 seconds) [07:58] *** nico_32 has joined #archiveteam [08:06] *** kolen has joined #archiveteam [08:07] Gigaom (I don't know what is it) is closing http://recode.net/2015/03/09/techs-pioneering-tech-blog-gigaom-closes-down/ [08:08] *** acridAxid has quit IRC (Read error: Operation timed out) [08:11] *** SadDM has quit IRC (Remote host closed the connection) [08:16] *** SadDM has joined #archiveteam [08:16] *** swebb sets mode: +o SadDM [08:18] *** acridAxid has joined #archiveteam [08:31] *** SadDM has quit IRC (Ping timeout: 370 seconds) [08:36] *** Ymgve has joined #archiveteam [08:49] *** SadDM has joined #archiveteam [08:49] *** swebb sets mode: +o SadDM [08:53] *** signius has quit IRC (Ping timeout: 306 seconds) [08:55] *** schbirid has joined #archiveteam [09:05] *** signius has joined #archiveteam [09:23] *** primus104 has joined #archiveteam [10:11] *** primus104 has quit IRC (Leaving.) [10:22] i'm going after gigaom right now [10:22] i'm grabbing the index pages then grabbing artciles by year from that web archive [10:47] *** brayden__ has joined #archiveteam [10:52] *** brayden_ has quit IRC (Read error: Operation timed out) [10:56] *** X-Scale has quit IRC (Ping timeout: 240 seconds) [10:58] *** X-Scale has joined #archiveteam [11:00] *** BlueMaxim has quit IRC (Quit: Leaving) [11:02] *** SadDM has quit IRC (Ping timeout: 370 seconds) [11:19] *** SadDM has joined #archiveteam [11:19] *** swebb sets mode: +o SadDM [11:32] *** SadDM has quit IRC (Ping timeout: 370 seconds) [11:38] *** Morbus has quit IRC (Quit: http://www.disobey.com/) [11:43] *** SadDM has joined #archiveteam [11:43] *** swebb sets mode: +o SadDM [11:44] http://www.quora.com/Whats-the-current-status-of-FriendFeed [11:44] At some point, everyone who knows how to bring the service back up will have left Facebook, something will go wrong, and the servers just won't come back. And then a horde of angry Turkish people will yell at me about it on Twitter, and we will all remember the small but well-loved site it was. [11:44] http://blog.friendfeed.com/2015/03/dear-friendfeed-community-were.html [11:44] *** Morbus has joined #archiveteam [11:54] schbirid: #humancentifeed [11:58] lol :D [11:58] excellent :) [12:02] *** SadDM has quit IRC (Ping timeout: 370 seconds) [12:04] *** SadDM has joined #archiveteam [12:04] *** swebb sets mode: +o SadDM [12:11] *** SadDM has quit IRC (Ping timeout: 370 seconds) [12:24] *** Froggypwn has quit IRC (Read error: Connection reset by peer) [12:24] *** Froggypwn has joined #archiveteam [12:27] *** nwf has quit IRC (Read error: Operation timed out) [12:28] *** fenn has quit IRC (Read error: Operation timed out) [12:28] *** fenn has joined #archiveteam [12:28] *** marnold has joined #archiveteam [12:28] *** yotta has quit IRC (Read error: Operation timed out) [12:29] *** marvinw has quit IRC (Read error: Operation timed out) [12:29] *** rduser has quit IRC (Read error: Operation timed out) [12:29] *** lysobit has quit IRC (Read error: Operation timed out) [12:29] *** adeodatus has quit IRC (Read error: Connection reset by peer) [12:30] *** sep332 has quit IRC (Read error: Operation timed out) [12:30] *** lysobit has joined #archiveteam [12:30] *** rduser has joined #archiveteam [12:30] *** Peetz0r_ has quit IRC (Read error: Operation timed out) [12:30] *** vegbrasil has quit IRC (Read error: Operation timed out) [12:30] *** Lord_Nigh has quit IRC (Read error: Operation timed out) [12:31] *** tephra has joined #archiveteam [12:31] *** josephroo has quit IRC (Read error: Operation timed out) [12:31] *** phuzion has quit IRC (Read error: Connection reset by peer) [12:31] *** achip has quit IRC (Ping timeout: 600 seconds) [12:31] *** Lord_Nigh has joined #archiveteam [12:31] *** S[h]O[r]T has quit IRC (Read error: Operation timed out) [12:32] *** balrog sets mode: +o Lord_Nigh [12:33] *** Ctrl-S has quit IRC (Read error: Operation timed out) [12:34] *** lrkj has quit IRC (Read error: Operation timed out) [12:35] *** SadDM has joined #archiveteam [12:35] *** swebb sets mode: +o SadDM [12:35] *** tephra_ has quit IRC (Ping timeout: 600 seconds) [12:35] *** lbft has quit IRC (Read error: Operation timed out) [12:35] *** yotta has joined #archiveteam [12:35] *** GLaDOS has quit IRC (Read error: Operation timed out) [12:35] *** josephroo has joined #archiveteam [12:35] *** lbft has joined #archiveteam [12:35] *** vegbrasil has joined #archiveteam [12:35] *** phuzion has joined #archiveteam [12:35] *** Ctrl-S has joined #archiveteam [12:35] *** achip has joined #archiveteam [12:36] *** sep332 has joined #archiveteam [12:36] *** nwf has joined #archiveteam [12:36] *** aMunster has quit IRC (Read error: Connection reset by peer) [12:36] *** S[h]O[r]T has joined #archiveteam [12:38] *** GLaDOS has joined #archiveteam [12:38] *** aMunster has joined #archiveteam [12:39] *** SadDM has quit IRC (Read error: Connection reset by peer) [12:39] *** SadDM has joined #archiveteam [12:39] *** swebb sets mode: +o SadDM [12:40] *** lrkj has joined #archiveteam [12:40] *** marvinw has joined #archiveteam [12:41] *** Peetz0r has joined #archiveteam [12:48] I didn't expect MediaCrush to shutdown. http://archiveteam.org/index.php?title=MediaCrush [12:48] Pomf.se still hosts some 2,5TB data I think [12:49] And it hasn't been archived [12:49] Word is Pomf.se is being killed some day (this year?) in favor of uguu.se [12:49] *** SadDM has quit IRC (Ping timeout: 370 seconds) [12:50] Pomf.se is also making big losses every month, it's been in decline ever since I stopped donating last Fall. [12:56] chfoo: Did you throw 8chan's boards to be deleted on ArchiveBot? [13:21] *** sankin has joined #archiveteam [13:29] pomf noscript message was kinda rude :/ [13:31] *** primus104 has joined #archiveteam [13:38] *** techapj_ has quit IRC (Ping timeout: 240 seconds) [13:38] People from 4chan's /g/ liked it so it was kept [13:41] hmmm, I think pomf has been talked about before [13:41] unsure what happened to that though [13:41] *** SadDM has joined #archiveteam [13:41] *** swebb sets mode: +o SadDM [13:43] What happened is that neku recently tried to sell some t-shirts to make Pomf.se's financing a bit better, it failed, tried it second time, that failed too. Last thing I heard from him was that he's now closer to getting that Portlane colocation, which would help him with costs [13:43] Twitter keeps blacklisting a.pomf.se links all the time [13:44] I believe I destroyed the emails this morning [13:45] But the server Pomf.se is currently hosted on now is reaching it's storage limits soon [13:45] 2x2TB drives, one full 2TB drive and the RAID was sacrificed for that extra 2TB I believe (if there was any RAID at all) [13:46] where is it hosted now? [13:46] Not sure what the word was on dumping archives in the event of possible shutdown [13:47] Rotab: LeaseWeb leased dedicated [13:47] oh [13:47] Unfortunately on a 100 Mbps unmetered connection, it's congested all the time so pings will vary up to 1000ms [13:48] *** SadDM has quit IRC (Ping timeout: 370 seconds) [13:49] You destroyed the emails? what [13:50] As much as it is anti-archive, yeah [13:50] Privacy policy on that server. [13:51] They were encrypted with GPG... [13:52] mail(1) also doesn't save sent messages unless you mail it to yourself [14:01] Rotab: Netherlands @ LeaseWeb, to be precise. [14:02] figured.. wasnt even aware they had other datacenters :D [14:05] they have lots of datacenters Rotab [14:05] US, Germany, France and US I believe [14:05] Not sure about Germany [14:05] yeah, TIL :) [14:06] *** SadDM has joined #archiveteam [14:06] *** swebb sets mode: +o SadDM [14:07] Pomf.se did not have a fun hosting experience. OVH and others kicked neku out after abuse notices started coming in, despite being handled timely. It was no surprise, but neku aimed for low cost until he got tired of being kicked so many times. [14:09] *** Start has quit IRC (Disconnected.) [14:11] *** Start has joined #archiveteam [14:13] *** MorbusIff has joined #archiveteam [14:13] *** Morbus has quit IRC (Ping timeout: 306 seconds) [14:15] InstallGentoo archive is based from Green Oval. RBT also has a copy of the IG archive (however it is not merged with RBT, yet). This needs some update then http://archiveteam.org/index.php?title=4chan#Fuuka-based_Archivers [14:15] 16:13:15 @WubTheCaptain | anounyym1: Uhm, did you say the InstallGentoo database was started from some other database? [14:15] 16:13:28 @WubTheCaptain | Green Oval? [14:15] 16:13:32 @anounyym1 | yes, green oval [14:16] archive.moe doesn't seem to host /g/ anymore [14:16] Also, the preferred domain for RBT is archive.rebeccablacktech.com [14:16] But userscripts use rbt.asia anyway [14:16] RBT's copy of InstallGentoo only has thumbnails, not full images. [14:18] RBT is supposed to host IG's archive at http://igarchive.rebeccablacktech.com/ however the SQL database had to be reindexed few days ago and it's not up yet [14:20] And some new info: [14:20] 16:19:53 @WubTheCaptain | anounyym1: Can you tell about the history of /cgl/ and /w/ archives? Was there a dump or was it started fresh? [14:20] 16:20:08 @anounyym1 | these are fresh, as well /mu/ [14:22] Should be noted that 4plebs is the only active archive for /pol/ board too, which was concerned some months ago on another IRC channel. archive.loveisover.me asked 4plebs admin for a dump, he said he'd do it sometime when he had time but I guess that didn't happen [14:22] *** nertzy has quit IRC (Leaving) [14:23] /pol/ also being one of the most active boards on 4chan [14:23] RBT is (was) not interested to archive it some months ago (that was also because of lack of disk space, which was added last week) [14:26] Also, //sp/ does have an archive called not4plebs. https://totally.not4plebs.org/ But that's the only board they archive. [14:26] Someone with wiki privileges, if you can update (not interested to register right now) [14:26] oh, that reminds me [14:27] i ran a personal fuuka for a short while (circa 2011) [14:27] i probably don't have a ton of data (<= month) but do you know if somebody would be interested in it? [14:28] If there's full images, probably some archives that are missing them [14:28] i think i only stored thumbs [14:28] Or some other missing data [14:29] i'll have to look at what i have [14:29] I can see a good reason why archive.moe doesn't host /g/ anymore, IG's database was fuuka and it's incompatible with foolfuuka that archive.moe uses [14:29] Sanqui: If anything, just dump it to archive.org [14:29] *nod* [14:33] *** OtherFox has joined #archiveteam [14:33] *** mutoso has quit IRC (Read error: Operation timed out) [14:33] *** gibigian1 has joined #archiveteam [14:33] *** edsu_ has joined #archiveteam [14:34] *** nertzy has joined #archiveteam [14:37] *** GLaDOS has quit IRC (hub.se efnet.port80.se) [14:37] *** Lord_Nigh has quit IRC (hub.se efnet.port80.se) [14:37] *** the_fox has quit IRC (hub.se efnet.port80.se) [14:37] *** gibigiana has quit IRC (hub.se efnet.port80.se) [14:37] *** serapeum has quit IRC (hub.se efnet.port80.se) [14:37] *** mietek has quit IRC (hub.se efnet.port80.se) [14:37] *** balrog has quit IRC (hub.se efnet.port80.se) [14:37] *** edsu has quit IRC (hub.se efnet.port80.se) [14:37] *** lytv has quit IRC (hub.se efnet.port80.se) [14:37] *** danneh_ has quit IRC (hub.se efnet.port80.se) [14:37] *** VonGuard has quit IRC (hub.se efnet.port80.se) [14:37] *** deathy has quit IRC (hub.se efnet.port80.se) [14:37] *** fresco___ has quit IRC (hub.se efnet.port80.se) [14:37] *** russss has quit IRC (hub.se efnet.port80.se) [14:37] *** LittUp has quit IRC (hub.se efnet.port80.se) [14:37] *** Muad-Dib has quit IRC (hub.se efnet.port80.se) [14:37] *** Rickster has quit IRC (hub.se efnet.port80.se) [14:37] *** lhobas has quit IRC (hub.se efnet.port80.se) [14:37] *** primus104 has quit IRC (Leaving.) [14:37] *** LordNigh2 has joined #archiveteam [14:38] *** serapeum_ has joined #archiveteam [14:38] By the way, there's "more reasons" to prefer the preferred archive.rebeccablacktech.com domain and that is robots.txt: https://archive.rebeccablacktech.com/robots.txt versus https://rbt.asia/robots.txt [14:38] Think Google images should be allowed? [14:38] crawler [14:40] *** balrog_ has joined #archiveteam [14:40] *** swebb sets mode: +o balrog_ [14:41] *** mutoso has joined #archiveteam [14:42] *** GLaDOS- has joined #archiveteam [14:49] *** deathy has joined #archiveteam [14:49] *** russss has joined #archiveteam [14:49] *** fresco___ has joined #archiveteam [14:49] *** Muad-Dib has joined #archiveteam [14:49] *** Rickster has joined #archiveteam [14:49] *** lhobas has joined #archiveteam [14:49] *** danneh_ has joined #archiveteam [14:49] *** LittUp has joined #archiveteam [14:49] *** VonGuard has joined #archiveteam [14:49] *** mietek has joined #archiveteam [14:52] *** LordNigh2 is now known as Lord_Nigh [14:52] *** balrog_ is now known as balrog [14:52] *** GLaDOS- is now known as GLaDOS [14:55] *** lytv has joined #archiveteam [15:02] guys, re. archivebot, is there a virtual machine appliance thingie for amazon or other virtual-server provider, so that someone can help by just paying the provider some rental money and let y'all manage the archivebot software ? [15:03] Resisting registering archive.horse [15:04] LOOK AT MY HORSE / THIS HORSE THAT LIKES SAVING [15:04] Sanqui: Sure, if you have any sorta dumps (db/images), I'd be interested in grabbing them off you [15:06] SketchCow: You might want to register it before I come into thoughts of doing so [15:06] archive.ninja is also available [15:09] archive.cat if you're into that too [15:09] I like .cat most of them all [15:09] archive.ninja sounds terrible. I saved your data, but you'll never find it! [15:14] *** mistym has joined #archiveteam [15:16] has someone got ebooks.horse? [15:17] I am.... I am........ not interested [15:22] fche: yes thats possible. you can ask someone in #archivebot [15:45] WubTheCap: yes, the archivebot job finished a few hours ago [15:49] *** BlueMaxim has joined #archiveteam [15:57] *** Start has quit IRC (Disconnected.) [16:05] *** mistym has quit IRC (Remote host closed the connection) [16:15] *** kolen has quit IRC (Ping timeout: 240 seconds) [16:17] *** mutoso_ has joined #archiveteam [16:21] *** mistym has joined #archiveteam [16:22] *** mutoso has quit IRC (Read error: Operation timed out) [16:27] *** primus104 has joined #archiveteam [16:42] *** dashcloud has quit IRC (Read error: Operation timed out) [16:44] *** primus104 has quit IRC (Leaving.) [16:45] *** MMovie has joined #archiveteam [16:45] *** MMovie1 has quit IRC (Ping timeout: 306 seconds) [16:46] *** dashcloud has joined #archiveteam [16:51] *** balrog sets mode: +o Lord_Nigh [16:55] *** kyan has quit IRC (Remote host closed the connection) [16:56] *** kyan has joined #archiveteam [17:00] *** Start has joined #archiveteam [17:01] *** serapeum_ is now known as serapeum [17:01] *** Start has quit IRC (Read error: Connection reset by peer) [17:02] *** Start has joined #archiveteam [17:25] too bad no .cow TLD [17:26] I bet .horse was only made because of horse_ebooks anyway, I bet [17:26] (sorry for -bs content) [17:35] *** khaoohs has joined #archiveteam [17:41] *** Start has quit IRC (Disconnected.) [17:41] *** Start has joined #archiveteam [17:42] *** Start has quit IRC (Client Quit) [17:43] *** lag2 has joined #archiveteam [18:42] *** Start has joined #archiveteam [18:57] *** primus104 has joined #archiveteam [18:59] *** Start has quit IRC (Disconnected.) [19:11] *** Start has joined #archiveteam [19:15] *** robink has joined #archiveteam [19:17] *** rolfb has joined #archiveteam [19:20] *** dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.) [19:21] *** dashcloud has joined #archiveteam [19:28] *** Start has quit IRC (Disconnected.) [19:29] *** nertzy has quit IRC (Quit: Leaving) [19:42] *** joeyh has quit IRC (Read error: Operation timed out) [19:44] *** Emcy has quit IRC (Read error: Connection reset by peer) [19:44] http://www.radiofreeburrito.com/radio-free-burrito-episode-42-here-home-us/ [19:44] "Thanks to Jason Scott, for digitizing a whole bunch of stuff that’s really important to our generation." [19:44] *** closure has joined #archiveteam [19:51] *** Emcy has joined #archiveteam [19:53] *** closure has quit IRC (Ping timeout: 306 seconds) [19:58] *** closure has joined #archiveteam [20:03] *** SilSte has joined #archiveteam [20:03] Hi [20:03] any plans on gigaom? [20:04] Got a way for saving facebook pages? [20:05] SilSte, we have gigaom in archivebot [20:06] SilSte: I'm also running youtube2internetarchive on their youtube channel https://archive.org/search.php?query=creator:%22gigaom%22 [20:06] thats good [20:07] i'm doing a bear article pages grab [20:07] gigaom is sort of slow so thats good its in archivebot [20:08] https://apps.facebook.com/320986651365615/ [20:08] I'm also getting their podcasts on Soundcloud, should be done downloading soon. [20:08] you can use youtube-dl on soundcloud [20:08] *** SilSte_ has joined #archiveteam [20:08] in case there is no download button [20:09] had a disconn [20:09] sounds good :) [20:10] *** MMovie has quit IRC (Ping timeout: 306 seconds) [20:11] *** SilSte has quit IRC (Ping timeout: 240 seconds) [20:18] *** MMovie has joined #archiveteam [20:29] *** Start has joined #archiveteam [20:30] *** Start has quit IRC (Read error: Connection reset by peer) [20:30] huh, thanks. I forgot youtube-dl did soundcloud. I was using https://github.com/mixuuu/SoundCloud-Downloader and wgeting the metadata [20:30] *** Start has joined #archiveteam [20:34] *** SadDM_ has joined #archiveteam [20:34] *** SadDM has quit IRC (Read error: Connection reset by peer) [20:34] *** swebb sets mode: +o SadDM_ [20:37] *** rolfb has quit IRC (Leaving...) [20:49] *** SilSte_ has quit IRC (Quit: Page closed) [20:52] *** lbft has quit IRC (Read error: Operation timed out) [20:52] *** thefox has joined #archiveteam [20:52] *** lbft has joined #archiveteam [20:53] *** sankin has quit IRC (Leaving.) [20:54] *** mutoso_ has quit IRC (Read error: Operation timed out) [20:55] *** mutoso has joined #archiveteam [20:55] *** Sellyme_ has joined #archiveteam [20:56] *** OtherFox has quit IRC (Ping timeout: 492 seconds) [20:57] *** PepsiMax has quit IRC (Read error: Operation timed out) [20:57] *** robink has quit IRC (Ping timeout: 492 seconds) [20:57] *** dashcloud has quit IRC (Read error: Operation timed out) [20:58] *** primus has quit IRC (Ping timeout: 492 seconds) [20:59] *** Sanqui has quit IRC (Ping timeout: 492 seconds) [20:59] *** joepie91_ has quit IRC (Ping timeout: 492 seconds) [21:00] *** dashcloud has joined #archiveteam [21:00] *** mistym has quit IRC (Remote host closed the connection) [21:01] *** mistym has joined #archiveteam [21:02] *** joepie91 has joined #archiveteam [21:02] *** Sellyme has quit IRC (Ping timeout: 724 seconds) [21:06] *** primus has joined #archiveteam [21:11] *** Sanqui has joined #archiveteam [21:13] *** dashcloud has quit IRC (Ping timeout: 260 seconds) [21:14] *** dashcloud has joined #archiveteam [21:19] *** Start has quit IRC (Disconnected.) [21:22] *** MMovie has quit IRC (Ping timeout: 306 seconds) [21:31] *** skiy has joined #archiveteam [21:37] *** schbirid has quit IRC (Leaving) [21:45] *** Ravenloft has quit IRC (Ping timeout: 606 seconds) [21:57] *** skiy has quit IRC (Leaving) [21:57] *** mistym has quit IRC (Remote host closed the connection) [21:58] *** skiy has joined #archiveteam [21:58] *** mistym has joined #archiveteam [22:00] *** Kniffy has quit IRC (Quit: pup) [22:06] *** Nertsy has quit IRC (Ping timeout: 306 seconds) [22:10] *** Start has joined #archiveteam [22:11] *** kniffy has joined #archiveteam [22:12] *** Nertsy has joined #archiveteam [22:21] *** MMovie has joined #archiveteam [22:21] *** SadDM_ has quit IRC (Remote host closed the connection) [22:24] *** MMovie1 has joined #archiveteam [22:26] *** MMovie has quit IRC (Ping timeout: 306 seconds) [22:26] *** SilSte has joined #archiveteam [22:26] *** SadDM has joined #archiveteam [22:26] *** swebb sets mode: +o SadDM [22:27] Do you think #Archivebot will save gigaom in time of a potential closure? [22:28] there are over 3 Million items to download... and the number is increasing [22:28] with 2 items/second... thats... long [22:30] I did not know if was that big [22:33] hey [22:33] oops [22:33] s/hey// [22:33] * Fusl disappears [22:38] *** skiy has quit IRC (Read error: Operation timed out) [22:41] *** PepsiMax has joined #archiveteam [22:44] *** yan has quit IRC (Quit: bye!) [22:46] 3 million? [22:47] 913 things, on average, being posted per day? [22:50] this is a little different from the normal archiving things, but with a little effort, it would have big benefits: https://wiki.documentfoundation.org/DLP/Samples (looking for regression test samples, and samples of unknown/not-well-known file types & versions) [22:57] *** lag2 has quit IRC (Read error: Operation timed out) [23:04] can someone help me with this? https://github.com/ArchiveTeam/terroroftinytown-client-grab#for-debianubuntu [23:05] i followed each command but it keeps telling me "ImportError: No module named seesaw.script.run_pipeline" [23:08] *** primus has quit IRC (Read error: Operation timed out) [23:09] what was the output from the command "pip install seesaw requests"? [23:09] eh [23:09] nevermind, i just fixed it [23:09] pip failed to install it apparently [23:09] great :) [23:09] glad you worked it out [23:10] let us know if there is anything else [23:11] lets see if it works ootb on a jailed debian https://scr.meo.ws/snapshot/2015-03-11-00-10-56-YM3x2dTP.png [23:11] not doing much yet apparently :) [23:12] thats odd [23:13] oh, no I think its working [23:14] also, i can see there are directories in mjail7(archive/"archiveteam stuff")/home/archiveteam/terroroftinytown-client-grab/data# find, but there are no files in there [23:14] *** SadDM has quit IRC (Remote host closed the connection) [23:14] *** SadDM has joined #archiveteam [23:14] *** swebb sets mode: +o SadDM [23:14] I do not think the URLTeam grabber is grabbing files [23:15] I think it just resolves "tiny-urls" and expand them [23:15] yes [23:15] that's what urlteam is [23:15] i am aware of that [23:15] so that we have a record of what they once was, when they expire [23:15] so, no files [23:16] still [23:16] it's not doing anything at all [23:16] no traffic in tcpdump, no information in screen [23:17] tcpdump should give traffic, yeah [23:17] what comes on the screen, I am uncertain [23:17] hrm [23:17] meh [23:17] wiped the jail and restarted it [23:18] seems to work now [23:18] https://scr.meo.ws/snapshot/2015-03-11-00-18-07-XwHNGxBh.png [23:18] just this pops up at the beginning now [23:18] not sure if this is normal [23:18] https://scr.meo.ws/snapshot/2015-03-11-00-18-30-rmOQYTJn.png