godane: cause there is timecode burn into the video
Asparagir: JAA; Of course we're interested, why would you even ask. :-)
***: Asparagir has quit IRC (Asparagir)
vitzli has joined #archiveteam-bs
Baljem has quit IRC (Read error: Operation timed out)
godane: i'm capturing the 36th tape from Laughing Squid
SketchCow: at this rate i may have all tapes digitize in about week
***: Odd0002_ has joined #archiveteam-bs
Odd0002 has quit IRC (Ping timeout: 600 seconds)
Odd0002_ is now known as Odd0002
dashcloud: @Stiletto Amazingly, I was able to find the article again: http://www.vintagecomputing.com/index.php/archives/1063/bringing-prodigy-back-from-the-dead
Stiletto: thanks so much :D
dashcloud: I can't wait to see all of the cool stuff you've found
***: username1 has joined #archiveteam-bs
schbirid2 has quit IRC (Read error: Operation timed out)
pizzaiolo has quit IRC (Quit: pizzaiolo)
vitzli has quit IRC (Quit: Leaving)
ranma: http://techreport.com/news/32659/pour-one-out-for-aol-instant-messenger
***: Sk1d has quit IRC (Ping timeout: 250 seconds)
Sk1d has joined #archiveteam-bs
superkuh: Yeah. I tried logging on to AIM just now for kicks. SSL error on login.
Can't seem to get online.
ICQ still works fine though. We'll always have ICQ.
ranma: hopefully
fie: zino, godane : I have a torrent site for "home-recordings" and " odd stuff like travel tapes"
superkuh, yeah you need an up-to-date client
superkuh: Makes sense. I'm using Pidgin 2.6.6 which is pretty old.
fie: I was afraid they were only going to allow official aim client but new pidgin works
gf just said they are shutting down now? wtf
damn you facebook
Why can't mozilla take it over or something
someone named Mental Elf messaged me...
nobody on my buddy list is ever signed on
godane: fie: is societyglitch?
i have a account there
***: Stiletto has quit IRC ()
fie: godane, yes
Just don't know where I would source home movies and odd stuff... probably not ebay.
***: Stilett0- has joined #archiveteam-bs
TheLovina has quit IRC (Ping timeout: 370 seconds)
brayden has quit IRC (Ping timeout: 255 seconds)
brayden has joined #archiveteam-bs
swebb sets mode: +o brayden
JAA: Alright, so about wordpress.com: they have a link shortener, wp.me. The shortcode can have various different formats for linking to specific pages of a blog (e.g. directly to a post or an image attached to a post etc.). The format of main interest in this context, however, is simply the blog ID encoded in base62 ([0-9a-zA-Z]).
This shortening is provided by Jetpack, a Wordpress plugin installed and activated by default on all wordpress.com blogs (including free ones).
It seems that the maximum blog ID is currently somewhere just below 9g000, i.e. on the order of 135M shortcodes need to be scanned (9 * 62^4 + 16 * 62^3).
That's also the order of magnitude of how many blogs there are.
We could do this through URLTeam and then figure out what to do with it later.
***: dd0a13f37 has joined #archiveteam-bs
dd0a13f37: Where do I report security issues for archive.org?
username1: info@archive.org
***: dashcloud has quit IRC (Read error: Connection reset by peer)
username1: they can either forward it for you or give you direct contact
dd0a13f37: alright, thanks
***: dashcloud has joined #archiveteam-bs
username1: thank YOU
dd0a13f37: Not sure it's anything major, but better safe than sorry I guess
godane: SketchCow: your getting a showtime airing of Road To Wellville cause that in the case of tapes
plus side is it got the most out of having 10000k setting being at 6.4gb
based on the preview of Outer Limits preview it aired on the week of 1996-04-05
it was a preview for the episode called "The Refuge" with actor M. Emmet Walsh
***: icedice has joined #archiveteam-bs
dd0a13f37: JAA: It's very close, more than an order of magnitude. Converting the latest shortlink to decimal and using it together with https://wordpress.com/activity/ to get an estimate
for posts/blog gives a result close to https://en.blog.wordpress.com/2015/01/06/2014-in-review/
Or the other way around, estimate number of blogs from 2014 posts/blog stats and stats, convert to b64, note that it's close
b62*
JAA: Yeah, I ran a test with the two-character codes and almost all of them existed.
dd0a13f37: According to that, there should be (base62) 08 57 30 52 59 blogs (131913987)
Which is close to 9g000
JAA: Yup
dd0a13f37: Although it's not exact - if you manipulate the POST request from the stats page you can get a chart for the number of blogs which gives 125452778 (08 30 23 61 56) as total
Or maybe they subtract deleted blogs, in which case it makes perfect sense
4 billion posts, that's actually not a whole lot
***: K4k has joined #archiveteam-bs
BlueMaxim has quit IRC (Read error: Connection reset by peer)
Mateon1 has quit IRC (Ping timeout: 255 seconds)
Mateon1 has joined #archiveteam-bs
dd0a13f37: I don't think any of the libgen collections on IA are complete unless the logs have been tampered with. Should I upload it again?
***: dashcloud has quit IRC (Read error: Operation timed out)
icedice2 has joined #archiveteam-bs
icedice has quit IRC (Ping timeout: 260 seconds)
icedice2 has quit IRC (Client Quit)
icedice has joined #archiveteam-bs
JAA: Just to confirm: is gawker.com archived, and where can the archives be found? I saw several mentions of it in the logs, but I can't find it on IA. (Via: https://www.reddit.com/r/Archiveteam/comments/73xszd/has_gawker_been_fully_archived/ )
***: dashcloud has joined #archiveteam-bs
dd0a13f37: nvm, i found the real collection, up to r_2092000 is archived
***: username1 is now known as schbirid
schbirid: anyone know how to strip all formatting from a $msg in irssi perl scripting?
***: Rai-chan has joined #archiveteam-bs
RichardG_ has joined #archiveteam-bs
RichardG has quit IRC (Read error: Connection reset by peer)
dd0a13f37: Sci-mag is archived up to 64099999, foreignfiction up to 1600000
Foreignfiction goes up to 1890000, sci-mag torrents are down so not sure exactly how far they go
schbirid: are they surely fully archived? or just 50% stalled torrents?
dd0a13f37: The torrents are seeded, so I think they're archived. The ones that I checked were at least
schbirid: i meant to grab all of scimag but about iirc 25% of the ones i tried were not fully seeded :(
dd0a13f37: Ask on forums for reseed then
***: loadup has joined #archiveteam-bs
icedice2 has joined #archiveteam-bs
icedice has quit IRC (Ping timeout: 250 seconds)
kepler45 has quit IRC (Quit: Leaving)
pizzaiolo has joined #archiveteam-bs
Asparagir has joined #archiveteam-bs
svchfoo1 sets mode: +o Asparagir
icedice2 has quit IRC (Quit: Leaving)
icedice has joined #archiveteam-bs
TC01 has quit IRC (Remote host closed the connection)
icedice2 has joined #archiveteam-bs
icedice has quit IRC (Read error: Operation timed out)
Asparagir has quit IRC (Asparagir)
icedice2 has quit IRC (Ping timeout: 255 seconds)
JAA: For the record, we're now grabbing wp.me in URLTeam. :-)
***: icedice has joined #archiveteam-bs
dd0a13f37: So will you archive the whole of WP?
Somebody2: dd0a13f37: just the URLs, not their contents (at this point, at least)
dd0a13f37: In URLteam, yes, but for the WP project
Are they having any problems?
JAA: Not that I know of.
Somebody2: Not that I know of, but it's good to have a backup
JAA: But I figured, why the hell not?
Somebody2: heh, jinx
dd0a13f37: 4 bil posts, 1/4 have images, 1bil images, 1m each, 1pb
Large endeavour
***: dd0a13f37 has quit IRC (Ping timeout: 268 seconds)
icedice2 has joined #archiveteam-bs
dd0a13f37 has joined #archiveteam-bs
icedice has quit IRC (Ping timeout: 506 seconds)
icedice has joined #archiveteam-bs
dd0a13f37 has quit IRC (Ping timeout: 268 seconds)
dd0a has joined #archiveteam-bs
dd0a is now known as dd0a13f37
icedice2 has quit IRC (Ping timeout: 506 seconds)
icedice2 has joined #archiveteam-bs
icedice has quit IRC (Ping timeout: 245 seconds)
icedice has joined #archiveteam-bs
dd0a13f37 has quit IRC (Ping timeout: 268 seconds)
ajshell1 has quit IRC (Quit: Leaving)
icedice2 has quit IRC (Read error: Operation timed out)
atrocity has joined #archiveteam-bs
Atros has quit IRC (Ping timeout: 246 seconds)
icedice2 has joined #archiveteam-bs
icedice has quit IRC (Read error: Operation timed out)
dd0a13f37 has joined #archiveteam-bs
dd0a13f37: It sure is some improvement over proxy+webirc
***: ajshell1 has joined #archiveteam-bs
atrocity has quit IRC (Read error: Connection reset by peer)
atrocity has joined #archiveteam-bs
ajshell1 has quit IRC (Quit: Leaving)
godane: so i have this on tape from the box: https://en.wikipedia.org/wiki/Heat_and_Sunlight
digitize it now
***: ajshell1 has joined #archiveteam-bs
ajshell1 has quit IRC (Client Quit)
ajshell1 has joined #archiveteam-bs
Stilett0- has quit IRC (Ping timeout: 260 seconds)
dashcloud has quit IRC (Remote host closed the connection)
dashcloud has joined #archiveteam-bs
TC01 has joined #archiveteam-bs
ajshell1 has quit IRC (Quit: Leaving)
ajshell1 has joined #archiveteam-bs
ajshell1 has quit IRC (Quit: Leaving)
kepler45 has joined #archiveteam-bs
ajshell1 has joined #archiveteam-bs
ajshell1 has quit IRC (Quit: Leaving)
kepler45 has quit IRC (Quit: Leaving)
odemg has quit IRC (Read error: Operation timed out)
odemg has joined #archiveteam-bs
dd0a13f37: A stripped-down version of archivebot for !ao, now that would be something. You could make it run much much faster if you can disregard certain constraints
JAA: You can't really ignore that much though. You still need to process images, stylesheets, scripts, etc.
An internet where everyone conforms to standards so we don't have to use parsers which are slowed down by all kinds of odd special cases, now that would be something.
dd0a13f37: Not always. And you could use another parser, like myhtml
Myhtml is fast, but there are no python binding
Somebody2: !ao jobs don't see to be much of a bottleneck
JAA: Indeed, we rarely have a queue of !ao jobs.
And that's with only one !ao-only pipeline...
dd0a13f37: JAA: I don't think they parse inline scripts
JAA: dd0a13f37: wpull does not actually parse scripts, but it does process it and tries to extract links from it.
s/it/them/
Same with CSS, I believe.
Only HTML is parsed properly.
***: zino has quit IRC (Read error: Connection reset by peer)
zino has joined #archiveteam-bs
ajshell1 has joined #archiveteam-bs
ajshell1 has quit IRC (Client Quit)
icedice has joined #archiveteam-bs
icedice2 has quit IRC (Ping timeout: 260 seconds)
ajshell1 has joined #archiveteam-bs
icedice2 has joined #archiveteam-bs
icedice has quit IRC (Ping timeout: 260 seconds)
dashcloud has quit IRC (Read error: Operation timed out)
dashcloud has joined #archiveteam-bs
icedice has joined #archiveteam-bs