[00:02] *** icedice has quit IRC (Ping timeout: 245 seconds) [00:06] *** wp494 has joined #archiveteam-bs [00:20] *** Stiletto has quit IRC (Read error: Operation timed out) [00:30] *** Stilett0 has joined #archiveteam-bs [00:34] *** Stilett0 is now known as Stiletto [01:10] sears tried really hard. RIP [01:22] "BobTheBuilder_Zoo-WT.exe" is this someone's personal computer? I thought CIA and FBI files were supposed to be highly secured... :/ [01:28] oh i'm an idiot. nvm [01:42] *** kristian_ has joined #archiveteam-bs [02:05] *** username1 has joined #archiveteam-bs [02:08] *** Famicoman has quit IRC (Ping timeout: 260 seconds) [02:09] *** schbirid2 has quit IRC (Read error: Operation timed out) [02:23] *** wp494 has quit IRC (LOUD UNNECESSARY QUIT MESSAGES) [02:26] *** wp494 has joined #archiveteam-bs [02:29] *** kristian_ has quit IRC (Quit: Leaving) [02:45] *** pizzaiolo has quit IRC (Remote host closed the connection) [02:56] arkiver: please spread the ops in #internetarchive -- also, there's a spammer there who should probably be banned. [02:57] SketchCow: I am interested in your incrediblly boring thing; I'll see which ones I get to. [03:15] SketchCow: https://archive.org/details/archiveteam_wiki is confusing; it's megawarcs of various wikis, from 2015 through 2017 [03:15] but I can't figure out where it was dicussed, or where the list of sites to grab came from. [04:13] Ha [04:13] It's the Archiveteam Wiki Project., [04:18] I know, but the wiki page for that doesn't mention an effort from 2015 through 2017 to grab WARCs, at least not where I could find it... [04:18] *** ranavalon has quit IRC (Read error: Connection reset by peer) [04:18] http://archiveteam.org/index.php?title=WikiTeam is pretty clear [04:21] hm, that page has no mention of the words WARC or Wrapup, which is what made me confused. [04:22] And the contents of https://archive.org/details/wikiteam (which is linked from there) is all using the wikiteam tools, not WARCs. [04:23] But yeah, it certainly seems *related* [04:30] SketchCow: the John Hoy 2000-06-22 tape 3 is really bad [04:31] *** qw3rty113 has joined #archiveteam-bs [04:31] its like very bright light so can't really see the picture [04:35] *** qw3rty112 has quit IRC (Read error: Operation timed out) [04:51] so i'm on my 3rd tape for today [04:58] tape? [05:02] https://i.imgur.com/kLetCe3.png [05:03] SketchCow: https://archive.org/details/archiveteam_20170123051332 should be in https://archive.org/details/archiveteam_flickr [05:03] it's in https://archive.org/details/archiveteam_torrents by mistake. [05:04] ok, I've incremented the required library version and switched the two bad items to new people [05:04] we'll see if that fixes it [05:35] SketchCow: OK, here are descriptions for some of your items that lack them: https://hastebin.com/omiwasowop.sql [05:37] *** Yurume has quit IRC (Read error: Operation timed out) [05:42] *** Yurume has joined #archiveteam-bs [05:47] i'm now on my 4th tape [06:02] its one of the Larry Sanders Show tapes i got [06:02] btw the label sticker didn't stick to the tape anymore [06:03] anyways this tape has episode S06E02 S06E03 and S06E05 [06:15] anyone know what timezone "noon" means? http://www.dpconline.org/our-work/bit-list [06:17] lol RIP. I didnt realize it was a new day. I thought that was yesterday [06:20] CoolCanuk: UTC, as the organization is based in the UK. [06:20] *** ZexaronS has quit IRC (Quit: Leaving) [06:20] so, another six hours. [06:26] just heard an explosion or maybe a house collapse, or car hit a hydro post or something. Weirdest sound ever [06:27] (I live in a fairly rural area. Also, it's 1:26 AM). Oh well. [06:34] please don't die. [06:49] im fine [06:49] godane: I'm looking at your grab of NPR's All Things Considered -- were you not able to get the recordings from after 2001? [06:53] i only have upload up to 2001-12-31 [06:53] i can get more of them [06:53] godane: Yes please! [06:54] Considering the current political climate, having offsite storage of as much of NPR as possible seems like a very good idea. [06:54] good news is i have up to dec 2006 on my drive [06:55] godane: nice [06:55] I'm adding a description to the collection. [06:55] which is how I came across it [06:59] Somebody2: I've made the additions. [07:01] SketchCow: thanks; I'm making up another batch now. [07:01] https://archive.org/details/archiveteam_dutchnews&tab=about <- somehow ended up in a weird format. [07:03] and I screwed up the links to the archiveteam wiki. [07:03] on e.g. https://archive.org/details/archiveteam_fanfiction [07:13] godane: did you extract the UN Radio files from https://www.unmultimedia.org/radio/english/ or somewhere else? [07:14] and do you have any from 2017 to upload? [07:14] i only did up to 2016 [07:14] OK. [07:17] SketchCow: is there anyway to make the podcast-mirror be downloadable [07:17] SketchCow: here's six more: https://hastebin.com/babetusaho.php [07:18] and I'm going to sleep. [07:18] i hate cloudflare? what? [07:18] xD [07:19] someone emailed me about Martin Yan shows and not being able to download them [07:20] https://archive.org/details/Martin_Yan_Shows [07:21] some try to download this: https://archive.org/download/Martin_Yan_Quick_and_Easy_S01E18/Martin_Yan_Quick_and_Easy_S01E18.avi [07:21] i can do it in firefox login [07:21] takedown? [07:21] "The item is not available due to issues with the item's content." [07:22] but wget gives me 403 forbidden item [07:24] I can't get https://ia902303.us.archive.org/29/items/Martin_Yan_Quick_and_Easy_S01E10/Martin_Yan_Quick_and_Easy_S01E10.mp4 to work either hmmmm [07:25] a bunch of files dont let me download :/ [07:31] anyways i'm up to 1107k items now [07:32] 1,107,509 items [07:53] *** Aerochrom has joined #archiveteam-bs [07:56] :o [08:56] *** Stiletto has quit IRC (Ping timeout: 250 seconds) [09:14] *** Mateon1 has quit IRC (Read error: Operation timed out) [09:16] *** Mateon1 has joined #archiveteam-bs [09:23] *** jschwart has joined #archiveteam-bs [09:37] *** alfie has quit IRC (Ping timeout: 248 seconds) [09:37] *** alfie has joined #archiveteam-bs [09:38] *** jschwart has quit IRC (Read error: Operation timed out) [09:39] *** medowar has joined #archiveteam-bs [09:39] *** tuluu has quit IRC (Ping timeout: 248 seconds) [09:39] *** midas3 has quit IRC (Ping timeout: 248 seconds) [09:40] *** midas3 has joined #archiveteam-bs [09:40] *** tuluu has joined #archiveteam-bs [09:53] i'm doing the secrets of isis volume 1 tape [10:06] *** jtn2_ has joined #archiveteam-bs [10:06] *** Mateon1 has quit IRC (se.hub irc.efnet.nl) [10:07] *** Ceryn has quit IRC (se.hub irc.efnet.nl) [10:07] *** jtn2 has quit IRC (se.hub irc.efnet.nl) [10:07] *** yuitimoth has quit IRC (se.hub irc.efnet.nl) [10:07] *** SmileyG has quit IRC (se.hub irc.efnet.nl) [10:07] *** ez has quit IRC (se.hub irc.efnet.nl) [10:07] *** w0rp has quit IRC (se.hub irc.efnet.nl) [10:07] *** MrRadar2 has quit IRC (se.hub irc.efnet.nl) [10:07] *** second has quit IRC (se.hub irc.efnet.nl) [10:07] *** Fusl has quit IRC (se.hub irc.efnet.nl) [10:07] *** will has quit IRC (se.hub irc.efnet.nl) [10:07] *** Tenebrae has quit IRC (se.hub irc.efnet.nl) [10:07] *** Xibalba has quit IRC (se.hub irc.efnet.nl) [10:07] *** hook54321 has quit IRC (se.hub irc.efnet.nl) [10:07] *** pizzaiolo has joined #archiveteam-bs [10:08] *** i0npulse has quit IRC (Ping timeout: 248 seconds) [10:10] *** Mateon1 has joined #archiveteam-bs [10:10] *** yuitimoth has joined #archiveteam-bs [10:10] *** SmileyG has joined #archiveteam-bs [10:10] *** ez has joined #archiveteam-bs [10:10] *** w0rp has joined #archiveteam-bs [10:10] *** MrRadar2 has joined #archiveteam-bs [10:10] *** second has joined #archiveteam-bs [10:10] *** Fusl has joined #archiveteam-bs [10:10] *** will has joined #archiveteam-bs [10:10] *** Tenebrae has joined #archiveteam-bs [10:10] *** Xibalba has joined #archiveteam-bs [10:10] *** hook54321 has joined #archiveteam-bs [10:10] *** irc.efnet.nl sets mode: +o hook54321 [10:10] *** i0npulse has joined #archiveteam-bs [10:17] *** jschwart has joined #archiveteam-bs [10:18] *** CoolCanuk has quit IRC (Quit: Connection closed for inactivity) [10:21] *** jschwart has quit IRC (Remote host closed the connection) [10:25] *** jschwart has joined #archiveteam-bs [10:42] The way back machine appears to be broken :< [10:55] so this is odd [10:55] the isis volume 1 tape has sort of a star trek blooper reel [10:56] at the end of it [10:57] *** Ceryn has joined #archiveteam-bs [11:14] Somebody2: These are somewhat simplistic descriptions, but they're good placeholders, thank you. [11:21] *** ranavalon has joined #archiveteam-bs [11:24] godane: When you wake up - I MIGHT have screwed up the Larry Sanders Show tape due to a bad regex. I'd re-upload for safety. [11:26] don't upload it [11:26] i'm uploading it right now [11:26] you misunderstand [11:26] By mistake, my uploader tried to upload it. [11:26] There's a chance it's truncated. [11:26] ok [11:26] so I'd say re-upload from scratch, it'll overwrite the thing [11:27] So when you get a chance, re-upload that file, so next time I overwrite the other [11:27] i'm uploading it RIGHT Now [11:27] I know [11:29] i reconnected and renamed the file [11:29] godane: upload that and put it on youtube :) [11:29] (Or link it here for the blooper reel!) [11:31] i will see about splitting the blooper reel from the isis volume 1 tape file [11:32] just know this tape is like 5 gens removed from masters it looks like [11:50] *** BlueMaxim has quit IRC (Quit: Leaving) [12:12] i'm uploading King of Iron Chefs Tournament Grand Finale Part 2 Food Network Tape [12:12] larry sanders show file is done [12:21] so i'm at 8501 items this month [12:44] *** marvinw has quit IRC (Leaving) [12:45] *** marvinw has joined #archiveteam-bs [12:47] *** dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.) [12:49] *** dashcloud has joined #archiveteam-bs [12:50] *** Stilett0 has joined #archiveteam-bs [13:27] VHS Ripping, man [13:28] It never ends [13:39] I'm uploading Buffy S07E01 Rough Cut And Hanunted S01E01 Revised Pilot Tape [14:06] godane: did you raid like a horde of VHS Tapes from film studio dumpster? [14:11] no [14:11] *** Darkstar has quit IRC (No Ping reply in 180 seconds.) [14:11] this all from Jason Scott's boxes he mailed me [14:12] Yeah [14:13] My fiancee worked for Entertainment Weekly in the early 2000s and got a lot of screener tapes. [14:19] and whats funny is you got a Entertainment Weekly preview 1999 tape [14:19] which even though it was not touch it was unplayable [14:20] the tracking was not working at all on that [14:20] *** Darkstar has joined #archiveteam-bs [14:21] anyways i resort some of the tape boxes so the one box i'm putting tapes has room [14:55] Nice, its great that you are finding all kinds of good shit, If you need help with anything, I'm game [14:56] I'm in Rochester NY now, so I'm not too far [15:05] We should just make you work for us at the Strong Museum [15:05] I have many things stored there, that need ripping/scanning [15:48] Where is it at Sketch? [15:49] NVM [15:49] its here local [15:49] dude I'm game [15:49] I'm bored as shit on the weekends [15:52] *** Aerochrom has quit IRC (Ping timeout: 248 seconds) [15:52] *** Aerochrom has joined #archiveteam-bs [15:53] i'm uploading Degrassi Junior High S01E01 to S01E12 The N Woc 2005-10 [15:54] that one is going to take awhile cause its 6 hours [16:01] i still have about 125gb of digitize tapes to upload [16:02] i got maybe 16 tapes i can digitize [16:03] there is maybe another 5 tapes i can't do cause of damage or mold [16:07] i also have a 250gb of my own rips to upload [16:07] Ya SketchCow hit me up if you need a man out in Rochester, I live in town and have lots of free time. My workplace used to do DR for a bank and has a TON of empty space just going to waste [16:08] about 73gb is my family christmas tv recordings [16:08] one of them goes to week of 1988-11-28 to 1988-12-05 [16:09] *** Stilett0 has quit IRC (Read error: Operation timed out) [16:11] *** fie has quit IRC (Read error: Operation timed out) [16:23] *** RichardG_ has joined #archiveteam-bs [16:24] *** RichardG has quit IRC (Ping timeout: 250 seconds) [16:25] *** fie has joined #archiveteam-bs [16:28] *** Stilett0 has joined #archiveteam-bs [17:00] *** CoolCanuk has joined #archiveteam-bs [17:01] If there's anything I can do to help with the 508 error, let me know [17:05] Its just on shared host, it does that [17:08] I could provide hosting... assuming the user/authentication database is separate from the wiki. [17:09] probably has been posted, but http://www.dpconline.org/our-work/bit-list is now available [17:10] SketchCow has been hosting it for years, hes not moving any time soon [17:10] :) just thought i'd offer [17:10] Ya [17:24] *** schbirid2 has joined #archiveteam-bs [17:30] *** username1 has quit IRC (Read error: Operation timed out) [17:53] *** ranavalon has quit IRC (Read error: Connection reset by peer) [17:54] *** ranavalon has joined #archiveteam-bs [18:22] *** phuzion has quit IRC (Remote host closed the connection) [18:53] *** jschwart has quit IRC (Quit: Konversation terminated!) [19:05] *** jschwart has joined #archiveteam-bs [19:30] *** icedice has joined #archiveteam-bs [19:36] *** icedice has quit IRC (Ping timeout: 245 seconds) [19:41] *** icedice has joined #archiveteam-bs [19:54] https://imgur.com/a/lOgWd "90s Stock Photography is the Bomb" [19:57] schbirid2: reminds me of http://www.annualreports.com/HostedData/AnnualReportArchive/s/TSX_SEARF_1999.pdf ... their annual report looks like a highschool textbook mixed with windows 95 [19:57] wow [20:01] we should put this on our 508 page xD https://i.imgur.com/jzaikp2.jpg [20:17] holy shit [20:17] I love it [20:28] does archivebot use ia useragent? [20:29] or robots.txt at all [20:44] it does not use the IA useragent [20:44] or robots.txt [20:44] perfect [20:45] !a https://partners.sears.ca/PAMP/Pages/Login/UserLogin.aspx [20:45] oops. [20:45] sorry [20:47] *** TheLovina has joined #archiveteam-bs [20:50] Is it possible to make an API call to the IA to check for items with a specific tag set to a specific value? [20:50] (ie check if item exists with "Originalurl" set to "$URL") [20:51] *** acridAxid has quit IRC (marauder) [20:52] *** acridAxid has joined #archiveteam-bs [20:53] bithippo: mabybe [20:54] https://archive.readme.io/docs [20:54] Testing https://archive.org/advancedsearch.php#raw out now [20:56] Extended tubeup to check for an item first before fetching from the source. [20:56] Extending* [20:59] *** ola_norsk has joined #archiveteam-bs [21:00] cool [21:02] Is there a way to determine if two items contain the same file (regardless of filename) without having to download all the files from each item? Been looking at the .sqlite db for one, in hope to find some kind of checksum [21:03] is the IA sqlite db format documented somewhere? [21:04] Each item has a torrent file associated with it, could get the file hash from there. [21:04] that contains the hash of each containing file? [21:05] Standby, testing. [21:07] when looking at the sqlite, i note there's e.g "ETag: "3d6c17432d2dd1a48ee0664ca32caeff"" , though i don't know if thats a hash [21:07] of the file, i mean [21:09] I'm wrong, I thought you could get a hash of the file from the torrent file, but you need to retrieve the file first. I would _assume_ that ETag is a hash of some sort. [21:11] yeah, it does look to be unique for each file in an item as far i can tell so far. I found it in the s3api_per_key_metadata > headers [21:11] not "yeah, you are wrong", but that [21:11] took me a second to write that line, sorry [21:12] ill try downloading a file and see if its md5 [21:12] wish the "headers" column wasnt so messy though lol [21:13] Agree! And its okay, I don't mind being wrong :D [21:15] that's the best kind of right there is :D [21:15] lol [21:16] Something cool, each file's crc32, md5, and sha1 hashes are available per item in the *_files.xml file within the item. [21:16] Example: https://ia801609.us.archive.org/30/items/youtube-ARrNYyJEnFI/youtube-ARrNYyJEnFI_files.xml [21:16] (xml format) [21:16] thank you! [21:18] Helpful if you know the item/item URL, but doesn't help if you want to search by hash, sorry about that! [21:19] im just trying to fix one item, where i accidentally uploaded some stuff from a previous item :] [21:20] though im unfortunatly sure if if the files are named the same, or how many of the first item ended up in the second item [21:20] *** ranavalon has quit IRC (Read error: Connection reset by peer) [21:20] not* [21:21] *** ranavalon has joined #archiveteam-bs [21:22] the xml is gold stuff! so thanks again! :D [21:22] no trouble at all! [21:30] *** ranavalon has quit IRC (Remote host closed the connection) [21:30] *** ranavalon has joined #archiveteam-bs [21:32] if i mv an (original) file from one item to another, the derivative files would get moved, deleted, regenerated as well? e.g would thumbnail files risk getting left behind? [21:34] would it perhaps be wiser to just do a "ia copy" and then "ia delete", instead of "ia mv" ? [21:35] *** ranavalon has quit IRC (Read error: Connection reset by peer) [21:35] *** ranavalon has joined #archiveteam-bs [21:37] i saw in Internet Archive Help faq pages something a button with "Re-Derive", but i've yet to see it on an item [21:37] *** jschwart has quit IRC (Konversation terminated!) [21:39] *** BlueMaxim has joined #archiveteam-bs [21:40] 150.000 vidoes ... https://www.youtube.com/watch?v=Ih2CIdnHWms&ab_channel=MellisaHoneybeeZaccaria [21:41] i guess child exploitation was sorely rampant... [21:43] 150K vidoes deleted. Though i have my doubts they were all "youtube kids videos targeted by child predators" [21:43] oh well [21:50] *** ranavalon has quit IRC (Read error: Connection reset by peer) [21:50] *** ranavalon has joined #archiveteam-bs [22:02] *** ranavalon has quit IRC (Read error: Connection reset by peer) [22:02] *** ranavalon has joined #archiveteam-bs [22:08] is it just me, or does the statistics of 270 youtube (video?) accounts terminated, and 150K videoes deleted somewhat mismatch ? [22:08] lots of videos per account, seems reasonable to me [22:10] in my experience it would be 250 videos per of these accounts. Going by the first one and foremost reknowned ones [22:10] ~250 to 300 [22:11] but they did certainly kept popping back up, that is undeniable [22:20] *** bithippo has quit IRC (Max SendQ exceeded) [22:21] hopefully no "out baby lost his first tooth" family memrobilia got mingled into the debacle [22:21] our* [22:21] oooh that would be bad! [22:22] first steps, etc [22:22] aye [22:26] *** RichardG_ is now known as RichardG [22:30] *** Pixi has quit IRC (Quit: Pixi) [22:30] *** Pixi has joined #archiveteam-bs [22:44] *** icedice2 has joined #archiveteam-bs [22:47] *** icedice has quit IRC (Ping timeout: 245 seconds) [22:51] Can we please get on this because it's going to be a shit show if we don't do it properly.... https://www.reddit.com/r/DataHoarder/comments/7gce4n/psndl_being_shut_down_if_not_enough_donations/ [22:51] arkiver schbirid2 chfoo balrog SketchCow ^^^ [22:56] so they use that site instead of paying $13 a month? [22:57] i'd contact this person and just ask if he'd let us download the files... https://github.com/jamesst20 [22:58] i dunno. Can you explain what PSNDL does? [22:59] I have no idea, I'm not in the know about ps3 or whatever, as far as I can tell it points to the official cdn and isn't serving the package files himself, the site just acts like a portal, stick it on archivebot just to grab the site itself, if we must we can concern ourselves over the packages later on (hosted by sony, not going anywhere anytime soon) [23:01] Looks like JAA already grabbed it with AB. [23:21] *** Stilett0 has quit IRC (Ping timeout: 250 seconds) [23:33] *** Polylith has quit IRC (Quit: ZNC - http://znc.in) [23:35] *** icedice2 has quit IRC (Quit: Leaving) [23:36] may i ask what AB is? [23:38] archivebot [23:38] ty [23:38] * ola_norsk is archive team n00b :) [23:39] :) [23:40] this reminds me, is there any way to circumenvent waybackmachine captures that's blocked by robots.txt? [23:42] i actually think it was a news paper article that i tried to archive in relation to an item, that was blocked from being archived by their servers robots.txt [23:42] Depends on how the data was grabbed. If archiveteam did it the raw warcs is probably around in a collection somewhere. [23:42] Archivebot does not care about robots.txt [23:43] how do i use the !ia ? [23:43] i remember i was told how a while back [23:43] !ia help [23:44] You mean !a or !ao? Those are archivebot commands. See #archivebot. [23:45] http://archivebot.readthedocs.io/en/latest/commands.html [23:55] how can we archive twitter/fb? [23:55] pages/profiles [23:58] !a{o} https://twitter.com/somethingorother --igset twitter --phantomjs [23:58] i've been running curl requests to web.archive.org/save/ to get the hashtags surrounding #netneutrality [23:58] And then pray that phantomjs works on the pipeline that ends up on. [23:58] would it do it twitter's own archives? [23:59] Unsure what that means, and I'm going to go sleep now.