#archiveteam-bs 2017-11-30,Thu

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)

WhoWhatWhen
***icedice has quit IRC (Ping timeout: 245 seconds)
wp494 has joined #archiveteam-bs
[00:02]
Stiletto has quit IRC (Read error: Operation timed out) [00:20]
Stilett0 has joined #archiveteam-bs
Stilett0 is now known as Stiletto
[00:30]
........ (idle for 36mn)
CoolCanuksears tried really hard. RIP [01:10]
"BobTheBuilder_Zoo-WT.exe" is this someone's personal computer? I thought CIA and FBI files were supposed to be highly secured... :/ [01:22]
oh i'm an idiot. nvm [01:28]
***kristian_ has joined #archiveteam-bs [01:42]
..... (idle for 23mn)
username1 has joined #archiveteam-bs
Famicoman has quit IRC (Ping timeout: 260 seconds)
schbirid2 has quit IRC (Read error: Operation timed out)
[02:05]
wp494 has quit IRC (LOUD UNNECESSARY QUIT MESSAGES)
wp494 has joined #archiveteam-bs
kristian_ has quit IRC (Quit: Leaving)
[02:23]
.... (idle for 16mn)
pizzaiolo has quit IRC (Remote host closed the connection) [02:45]
Somebody2arkiver: please spread the ops in #internetarchive -- also, there's a spammer there who should probably be banned.
SketchCow: I am interested in your incrediblly boring thing; I'll see which ones I get to.
[02:56]
.... (idle for 18mn)
SketchCow: https://archive.org/details/archiveteam_wiki is confusing; it's megawarcs of various wikis, from 2015 through 2017
but I can't figure out where it was dicussed, or where the list of sites to grab came from.
[03:15]
............ (idle for 58mn)
SketchCowHa
It's the Archiveteam Wiki Project.,
[04:13]
Somebody2I know, but the wiki page for that doesn't mention an effort from 2015 through 2017 to grab WARCs, at least not where I could find it... [04:18]
***ranavalon has quit IRC (Read error: Connection reset by peer) [04:18]
SketchCowhttp://archiveteam.org/index.php?title=WikiTeam is pretty clear [04:18]
Somebody2hm, that page has no mention of the words WARC or Wrapup, which is what made me confused.
And the contents of https://archive.org/details/wikiteam (which is linked from there) is all using the wikiteam tools, not WARCs.
But yeah, it certainly seems *related*
[04:21]
godaneSketchCow: the John Hoy 2000-06-22 tape 3 is really bad [04:30]
***qw3rty113 has joined #archiveteam-bs [04:31]
godaneits like very bright light so can't really see the picture [04:31]
***qw3rty112 has quit IRC (Read error: Operation timed out) [04:35]
.... (idle for 16mn)
godaneso i'm on my 3rd tape for today [04:51]
CoolCanuktape?
https://i.imgur.com/kLetCe3.png
[04:58]
Somebody2SketchCow: https://archive.org/details/archiveteam_20170123051332 should be in https://archive.org/details/archiveteam_flickr
it's in https://archive.org/details/archiveteam_torrents by mistake.
ok, I've incremented the required library version and switched the two bad items to new people
we'll see if that fixes it
[05:03]
....... (idle for 31mn)
SketchCow: OK, here are descriptions for some of your items that lack them: https://hastebin.com/omiwasowop.sql [05:35]
***Yurume has quit IRC (Read error: Operation timed out) [05:37]
Yurume has joined #archiveteam-bs [05:42]
godanei'm now on my 4th tape [05:47]
.... (idle for 15mn)
its one of the Larry Sanders Show tapes i got
btw the label sticker didn't stick to the tape anymore
anyways this tape has episode S06E02 S06E03 and S06E05
[06:02]
CoolCanukanyone know what timezone "noon" means? http://www.dpconline.org/our-work/bit-list
lol RIP. I didnt realize it was a new day. I thought that was yesterday
[06:15]
Somebody2CoolCanuk: UTC, as the organization is based in the UK. [06:20]
***ZexaronS has quit IRC (Quit: Leaving) [06:20]
Somebody2so, another six hours. [06:20]
CoolCanukjust heard an explosion or maybe a house collapse, or car hit a hydro post or something. Weirdest sound ever
(I live in a fairly rural area. Also, it's 1:26 AM). Oh well.
[06:26]
Somebody2please don't die. [06:34]
.... (idle for 15mn)
CoolCanukim fine [06:49]
Somebody2godane: I'm looking at your grab of NPR's All Things Considered -- were you not able to get the recordings from after 2001? [06:49]
godanei only have upload up to 2001-12-31
i can get more of them
[06:53]
Somebody2godane: Yes please!
Considering the current political climate, having offsite storage of as much of NPR as possible seems like a very good idea.
[06:53]
godanegood news is i have up to dec 2006 on my drive [06:54]
Somebody2godane: nice
I'm adding a description to the collection.
which is how I came across it
[06:55]
SketchCowSomebody2: I've made the additions. [06:59]
Somebody2SketchCow: thanks; I'm making up another batch now.
https://archive.org/details/archiveteam_dutchnews&tab=about <- somehow ended up in a weird format.
and I screwed up the links to the archiveteam wiki.
on e.g. https://archive.org/details/archiveteam_fanfiction
[07:01]
godane: did you extract the UN Radio files from https://www.unmultimedia.org/radio/english/ or somewhere else?
and do you have any from 2017 to upload?
[07:13]
godanei only did up to 2016 [07:14]
Somebody2OK. [07:14]
godaneSketchCow: is there anyway to make the podcast-mirror be downloadable [07:17]
Somebody2SketchCow: here's six more: https://hastebin.com/babetusaho.php
and I'm going to sleep.
[07:17]
Coderjoi hate cloudflare? what? [07:18]
CoolCanukxD [07:18]
godanesomeone emailed me about Martin Yan shows and not being able to download them
https://archive.org/details/Martin_Yan_Shows
some try to download this: https://archive.org/download/Martin_Yan_Quick_and_Easy_S01E18/Martin_Yan_Quick_and_Easy_S01E18.avi
i can do it in firefox login
[07:19]
CoolCanuktakedown?
"The item is not available due to issues with the item's content."
[07:21]
godanebut wget gives me 403 forbidden item [07:22]
CoolCanukI can't get https://ia902303.us.archive.org/29/items/Martin_Yan_Quick_and_Easy_S01E10/Martin_Yan_Quick_and_Easy_S01E10.mp4 to work either hmmmm
a bunch of files dont let me download :/
[07:24]
godaneanyways i'm up to 1107k items now
1,107,509 items
[07:31]
..... (idle for 21mn)
***Aerochrom has joined #archiveteam-bs [07:53]
CoolCanuk:o [07:56]
............. (idle for 1h0mn)
***Stiletto has quit IRC (Ping timeout: 250 seconds) [08:56]
.... (idle for 18mn)
Mateon1 has quit IRC (Read error: Operation timed out)
Mateon1 has joined #archiveteam-bs
[09:14]
jschwart has joined #archiveteam-bs [09:23]
alfie has quit IRC (Ping timeout: 248 seconds)
alfie has joined #archiveteam-bs
jschwart has quit IRC (Read error: Operation timed out)
medowar has joined #archiveteam-bs
tuluu has quit IRC (Ping timeout: 248 seconds)
midas3 has quit IRC (Ping timeout: 248 seconds)
midas3 has joined #archiveteam-bs
tuluu has joined #archiveteam-bs
[09:37]
godanei'm doing the secrets of isis volume 1 tape [09:53]
***jtn2_ has joined #archiveteam-bs
Mateon1 has quit IRC (se.hub irc.efnet.nl)
Ceryn has quit IRC (se.hub irc.efnet.nl)
jtn2 has quit IRC (se.hub irc.efnet.nl)
yuitimoth has quit IRC (se.hub irc.efnet.nl)
SmileyG has quit IRC (se.hub irc.efnet.nl)
ez has quit IRC (se.hub irc.efnet.nl)
w0rp has quit IRC (se.hub irc.efnet.nl)
MrRadar2 has quit IRC (se.hub irc.efnet.nl)
second has quit IRC (se.hub irc.efnet.nl)
Fusl has quit IRC (se.hub irc.efnet.nl)
will has quit IRC (se.hub irc.efnet.nl)
Tenebrae has quit IRC (se.hub irc.efnet.nl)
Xibalba has quit IRC (se.hub irc.efnet.nl)
hook54321 has quit IRC (se.hub irc.efnet.nl)
pizzaiolo has joined #archiveteam-bs
i0npulse has quit IRC (Ping timeout: 248 seconds)
Mateon1 has joined #archiveteam-bs
yuitimoth has joined #archiveteam-bs
SmileyG has joined #archiveteam-bs
ez has joined #archiveteam-bs
w0rp has joined #archiveteam-bs
MrRadar2 has joined #archiveteam-bs
second has joined #archiveteam-bs
Fusl has joined #archiveteam-bs
will has joined #archiveteam-bs
Tenebrae has joined #archiveteam-bs
Xibalba has joined #archiveteam-bs
hook54321 has joined #archiveteam-bs
irc.efnet.nl sets mode: +o hook54321
i0npulse has joined #archiveteam-bs
[10:06]
jschwart has joined #archiveteam-bs
CoolCanuk has quit IRC (Quit: Connection closed for inactivity)
jschwart has quit IRC (Remote host closed the connection)
jschwart has joined #archiveteam-bs
[10:17]
.... (idle for 17mn)
IglooThe way back machine appears to be broken :< [10:42]
godaneso this is odd
the isis volume 1 tape has sort of a star trek blooper reel
at the end of it
[10:55]
***Ceryn has joined #archiveteam-bs [10:57]
.... (idle for 17mn)
SketchCowSomebody2: These are somewhat simplistic descriptions, but they're good placeholders, thank you. [11:14]
***ranavalon has joined #archiveteam-bs [11:21]
SketchCowgodane: When you wake up - I MIGHT have screwed up the Larry Sanders Show tape due to a bad regex. I'd re-upload for safety. [11:24]
godanedon't upload it
i'm uploading it right now
[11:26]
SketchCowyou misunderstand
By mistake, my uploader tried to upload it.
There's a chance it's truncated.
[11:26]
godaneok [11:26]
SketchCowso I'd say re-upload from scratch, it'll overwrite the thing
So when you get a chance, re-upload that file, so next time I overwrite the other
[11:26]
godanei'm uploading it RIGHT Now [11:27]
SketchCowI know [11:27]
godanei reconnected and renamed the file [11:29]
Igloogodane: upload that and put it on youtube :)
(Or link it here for the blooper reel!)
[11:29]
godanei will see about splitting the blooper reel from the isis volume 1 tape file
just know this tape is like 5 gens removed from masters it looks like
[11:31]
.... (idle for 18mn)
***BlueMaxim has quit IRC (Quit: Leaving) [11:50]
..... (idle for 22mn)
godanei'm uploading King of Iron Chefs Tournament Grand Finale Part 2 Food Network Tape
larry sanders show file is done
[12:12]
so i'm at 8501 items this month [12:21]
..... (idle for 23mn)
***marvinw has quit IRC (Leaving)
marvinw has joined #archiveteam-bs
dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.)
dashcloud has joined #archiveteam-bs
Stilett0 has joined #archiveteam-bs
[12:44]
........ (idle for 37mn)
jrwrVHS Ripping, man [13:27]
SketchCowIt never ends [13:28]
godaneI'm uploading Buffy S07E01 Rough Cut And Hanunted S01E01 Revised Pilot Tape [13:39]
...... (idle for 27mn)
jrwrgodane: did you raid like a horde of VHS Tapes from film studio dumpster? [14:06]
godaneno [14:11]
***Darkstar has quit IRC (No Ping reply in 180 seconds.) [14:11]
godanethis all from Jason Scott's boxes he mailed me [14:11]
SketchCowYeah
My fiancee worked for Entertainment Weekly in the early 2000s and got a lot of screener tapes.
[14:12]
godaneand whats funny is you got a Entertainment Weekly preview 1999 tape
which even though it was not touch it was unplayable
the tracking was not working at all on that
[14:19]
***Darkstar has joined #archiveteam-bs [14:20]
godaneanyways i resort some of the tape boxes so the one box i'm putting tapes has room [14:21]
....... (idle for 34mn)
jrwrNice, its great that you are finding all kinds of good shit, If you need help with anything, I'm game
I'm in Rochester NY now, so I'm not too far
[14:55]
SketchCowWe should just make you work for us at the Strong Museum
I have many things stored there, that need ripping/scanning
[15:05]
......... (idle for 43mn)
jrwrWhere is it at Sketch?
NVM
its here local
dude I'm game
I'm bored as shit on the weekends
[15:48]
***Aerochrom has quit IRC (Ping timeout: 248 seconds)
Aerochrom has joined #archiveteam-bs
[15:52]
godanei'm uploading Degrassi Junior High S01E01 to S01E12 The N Woc 2005-10
that one is going to take awhile cause its 6 hours
[15:53]
i still have about 125gb of digitize tapes to upload
i got maybe 16 tapes i can digitize
there is maybe another 5 tapes i can't do cause of damage or mold
i also have a 250gb of my own rips to upload
[16:01]
jrwrYa SketchCow hit me up if you need a man out in Rochester, I live in town and have lots of free time. My workplace used to do DR for a bank and has a TON of empty space just going to waste [16:07]
godaneabout 73gb is my family christmas tv recordings
one of them goes to week of 1988-11-28 to 1988-12-05
[16:08]
***Stilett0 has quit IRC (Read error: Operation timed out)
fie has quit IRC (Read error: Operation timed out)
[16:09]
RichardG_ has joined #archiveteam-bs
RichardG has quit IRC (Ping timeout: 250 seconds)
fie has joined #archiveteam-bs
Stilett0 has joined #archiveteam-bs
[16:23]
....... (idle for 32mn)
CoolCanuk has joined #archiveteam-bs [17:00]
CoolCanukIf there's anything I can do to help with the 508 error, let me know [17:01]
jrwrIts just on shared host, it does that [17:05]
CoolCanukI could provide hosting... assuming the user/authentication database is separate from the wiki.
probably has been posted, but http://www.dpconline.org/our-work/bit-list is now available
[17:08]
jrwrSketchCow has been hosting it for years, hes not moving any time soon [17:10]
CoolCanuk:) just thought i'd offer [17:10]
jrwrYa [17:10]
***schbirid2 has joined #archiveteam-bs [17:24]
username1 has quit IRC (Read error: Operation timed out) [17:30]
..... (idle for 23mn)
ranavalon has quit IRC (Read error: Connection reset by peer)
ranavalon has joined #archiveteam-bs
[17:53]
...... (idle for 28mn)
phuzion has quit IRC (Remote host closed the connection) [18:22]
....... (idle for 31mn)
jschwart has quit IRC (Quit: Konversation terminated!) [18:53]
jschwart has joined #archiveteam-bs [19:05]
...... (idle for 25mn)
icedice has joined #archiveteam-bs [19:30]
icedice has quit IRC (Ping timeout: 245 seconds) [19:36]
icedice has joined #archiveteam-bs [19:41]
schbirid2https://imgur.com/a/lOgWd "90s Stock Photography is the Bomb" [19:54]
CoolCanukschbirid2: reminds me of http://www.annualreports.com/HostedData/AnnualReportArchive/s/TSX_SEARF_1999.pdf ... their annual report looks like a highschool textbook mixed with windows 95 [19:57]
schbirid2wow [19:57]
CoolCanukwe should put this on our 508 page xD https://i.imgur.com/jzaikp2.jpg [20:01]
.... (idle for 16mn)
jrwrholy shit
I love it
[20:17]
CoolCanukdoes archivebot use ia useragent?
or robots.txt at all
[20:28]
.... (idle for 15mn)
jrwrit does not use the IA useragent
or robots.txt
[20:44]
CoolCanukperfect
!a https://partners.sears.ca/PAMP/Pages/Login/UserLogin.aspx
oops.
sorry
[20:44]
***TheLovina has joined #archiveteam-bs [20:47]
bithippoIs it possible to make an API call to the IA to check for items with a specific tag set to a specific value?
(ie check if item exists with "Originalurl" set to "$URL")
[20:50]
***acridAxid has quit IRC (marauder)
acridAxid has joined #archiveteam-bs
[20:51]
jrwrbithippo: mabybe
https://archive.readme.io/docs
[20:53]
bithippoTesting https://archive.org/advancedsearch.php#raw out now
Extended tubeup to check for an item first before fetching from the source.
Extending*
[20:54]
***ola_norsk has joined #archiveteam-bs [20:59]
jrwrcool [21:00]
ola_norskIs there a way to determine if two items contain the same file (regardless of filename) without having to download all the files from each item? Been looking at the .sqlite db for one, in hope to find some kind of checksum
is the IA sqlite db format documented somewhere?
[21:02]
bithippoEach item has a torrent file associated with it, could get the file hash from there. [21:04]
ola_norskthat contains the hash of each containing file? [21:04]
bithippoStandby, testing. [21:05]
ola_norskwhen looking at the sqlite, i note there's e.g "ETag: "3d6c17432d2dd1a48ee0664ca32caeff"" , though i don't know if thats a hash
of the file, i mean
[21:07]
bithippoI'm wrong, I thought you could get a hash of the file from the torrent file, but you need to retrieve the file first. I would _assume_ that ETag is a hash of some sort. [21:09]
ola_norskyeah, it does look to be unique for each file in an item as far i can tell so far. I found it in the s3api_per_key_metadata > headers
not "yeah, you are wrong", but that
took me a second to write that line, sorry
ill try downloading a file and see if its md5
wish the "headers" column wasnt so messy though lol
[21:11]
bithippoAgree! And its okay, I don't mind being wrong :D [21:13]
ola_norskthat's the best kind of right there is :D [21:15]
bithippolol
Something cool, each file's crc32, md5, and sha1 hashes are available per item in the *_files.xml file within the item.
Example: https://ia801609.us.archive.org/30/items/youtube-ARrNYyJEnFI/youtube-ARrNYyJEnFI_files.xml
(xml format)
[21:15]
ola_norskthank you! [21:16]
bithippoHelpful if you know the item/item URL, but doesn't help if you want to search by hash, sorry about that! [21:18]
ola_norskim just trying to fix one item, where i accidentally uploaded some stuff from a previous item :]
though im unfortunatly sure if if the files are named the same, or how many of the first item ended up in the second item
[21:19]
***ranavalon has quit IRC (Read error: Connection reset by peer) [21:20]
ola_norsknot* [21:20]
***ranavalon has joined #archiveteam-bs [21:21]
ola_norskthe xml is gold stuff! so thanks again! :D [21:22]
bithippono trouble at all! [21:22]
***ranavalon has quit IRC (Remote host closed the connection)
ranavalon has joined #archiveteam-bs
[21:30]
ola_norskif i mv an (original) file from one item to another, the derivative files would get moved, deleted, regenerated as well? e.g would thumbnail files risk getting left behind?
would it perhaps be wiser to just do a "ia copy" and then "ia delete", instead of "ia mv" ?
[21:32]
***ranavalon has quit IRC (Read error: Connection reset by peer)
ranavalon has joined #archiveteam-bs
[21:35]
ola_norski saw in Internet Archive Help faq pages something a button with "Re-Derive", but i've yet to see it on an item [21:37]
***jschwart has quit IRC (Konversation terminated!)
BlueMaxim has joined #archiveteam-bs
[21:37]
ola_norsk150.000 vidoes ... https://www.youtube.com/watch?v=Ih2CIdnHWms&ab_channel=MellisaHoneybeeZaccaria
i guess child exploitation was sorely rampant...
150K vidoes deleted. Though i have my doubts they were all "youtube kids videos targeted by child predators"
oh well
[21:40]
***ranavalon has quit IRC (Read error: Connection reset by peer)
ranavalon has joined #archiveteam-bs
[21:50]
ranavalon has quit IRC (Read error: Connection reset by peer)
ranavalon has joined #archiveteam-bs
[22:02]
ola_norskis it just me, or does the statistics of 270 youtube (video?) accounts terminated, and 150K videoes deleted somewhat mismatch ? [22:08]
astridlots of videos per account, seems reasonable to me [22:08]
ola_norskin my experience it would be 250 videos per of these accounts. Going by the first one and foremost reknowned ones
~250 to 300
but they did certainly kept popping back up, that is undeniable
[22:10]
***bithippo has quit IRC (Max SendQ exceeded) [22:20]
ola_norskhopefully no "out baby lost his first tooth" family memrobilia got mingled into the debacle
our*
[22:21]
CoolCanukoooh that would be bad!
first steps, etc
[22:21]
ola_norskaye [22:22]
***RichardG_ is now known as RichardG
Pixi has quit IRC (Quit: Pixi)
Pixi has joined #archiveteam-bs
[22:26]
icedice2 has joined #archiveteam-bs
icedice has quit IRC (Ping timeout: 245 seconds)
[22:44]
odemgCan we please get on this because it's going to be a shit show if we don't do it properly.... https://www.reddit.com/r/DataHoarder/comments/7gce4n/psndl_being_shut_down_if_not_enough_donations/
arkiver schbirid2 chfoo balrog SketchCow ^^^
[22:51]
CoolCanukso they use that site instead of paying $13 a month?
i'd contact this person and just ask if he'd let us download the files... https://github.com/jamesst20
i dunno. Can you explain what PSNDL does?
[22:56]
odemgI have no idea, I'm not in the know about ps3 or whatever, as far as I can tell it points to the official cdn and isn't serving the package files himself, the site just acts like a portal, stick it on archivebot just to grab the site itself, if we must we can concern ourselves over the packages later on (hosted by sony, not going anywhere anytime soon) [22:59]
dxrtLooks like JAA already grabbed it with AB. [23:01]
..... (idle for 20mn)
***Stilett0 has quit IRC (Ping timeout: 250 seconds) [23:21]
Polylith has quit IRC (Quit: ZNC - http://znc.in)
icedice2 has quit IRC (Quit: Leaving)
[23:33]
ola_norskmay i ask what AB is? [23:36]
dxrtarchivebot [23:38]
ola_norskty
ola_norsk is archive team n00b :)
[23:38]
dxrt:) [23:39]
ola_norskthis reminds me, is there any way to circumenvent waybackmachine captures that's blocked by robots.txt?
i actually think it was a news paper article that i tried to archive in relation to an item, that was blocked from being archived by their servers robots.txt
[23:40]
zinoDepends on how the data was grabbed. If archiveteam did it the raw warcs is probably around in a collection somewhere.
Archivebot does not care about robots.txt
[23:42]
ola_norskhow do i use the !ia ?
i remember i was told how a while back
!ia help
[23:43]
zinoYou mean !a or !ao? Those are archivebot commands. See #archivebot.
http://archivebot.readthedocs.io/en/latest/commands.html
[23:44]
CoolCanukhow can we archive twitter/fb?
pages/profiles
[23:55]
zino!a{o} https://twitter.com/somethingorother --igset twitter --phantomjs [23:58]
ola_norski've been running curl requests to web.archive.org/save/<url> to get the hashtags surrounding #netneutrality [23:58]
zinoAnd then pray that phantomjs works on the pipeline that ends up on. [23:58]
ola_norskwould it do it twitter's own archives? [23:58]
zinoUnsure what that means, and I'm going to go sleep now. [23:59]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)